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Preface 



Claude Shannon, the father of Information Theory, described the fundamental 
problem of point-to-point communications in his classic 1948 paper as "that of 
reproducing at one point either exactly or approximately a message selected at 
another point." How engineers solve this problem is the subject of this book. 
But unlike Shannon's general problem, where the message can be an image, a 
sound clip, or a movie, here we restrict ourselves to bits. We thus envision that 
the original message is either a binary sequence to start with, or else that it was 
described using bits by a device outside our control and that our job is to reproduce 
the describing bits with high reliability. The issue of how images or text files are 
converted efficiently into bits is the subject of lossy and lossless data compression 
and is addressed in texts on information theory and on quantization. 

The engineering solutions to the point-to-point communication problem greatly 
depend on the available resources and on the channel between the points. They 
typically bring together beautiful techniques from Fourier Analysis, Hilbert Spaces, 
Probability Theory, and Decision Theory. The purpose of this book is to introduce 
the reader to these techniques and to their interplay. 

The book is intended for advanced undergraduates and beginning graduate stu- 
dents. The key prerequisites are basic courses in Calculus, Linear Algebra, and 
Probability Theory. A course in Linear Systems is a plus but not a must, because 
all the results from Linear Systems that are needed for this book are summarized 
in Chapters 5 and 6. But more importantly, the book requires a certain mathemat- 
ical maturity and patience, because we begin with first principles and develop the 
theory before discussing its engineering applications. The book is for those who 
appreciate the views along the way as much as getting to the destination; who like 
to "stop and smell the roses;" and who prefer fundamentals to acronyms. I firmly 
believe that those with a sound foundation can easily pick up the acronyms and 
learn the jargon on the job, but that once one leaves the academic environment, 
one rarely has the time or peace of mind to study fundamentals. 

In the early stages of the planning of this book I took a decision that greatly 
influenced the project. I decided that every key concept should be unambiguously 
defined; that every key result should be stated as a mathematical theorem; and 
that every mathematical theorem should be correct. This, I believe, makes for 
a solid foundation on which one can build with confidence. But it is also a tall 
order. It required that I scrutinize each "classical" result before I used it in order 
to be sure that I knew what the needed qualifiers were, and it forced me to include 
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background material to which the reader may have already been exposed, because 
I needed the results "done right." Hence Chapters 5 and 6 on Linear Systems and 
Fourier Analysis. This is also partly the reason why the book is so long. When I 
started out my intention was to write a much shorter book. But I found that to do 
justice to the beautiful mathematics on which Digital Communications is based I 
had to expand the book. 

Most physical layer communication problems are at their core of a continuous- 
time nature. The transmitted physical waveforms are functions of time and not 
sequences synchronized to a clock. But most solutions first reduce the problem to a 
discrete-time setting and then solve the problem in the discrete-time domain. The 
reduction to discrete-time often requires great ingenuity, which I try to describe. 
It is often taken for granted in courses that open with a discrete-time model from 
Lecture 1. I emphasize that most communication problems are of a continuous- 
time nature, and that the reduction to discrete-time is not always trivial or even 
possible. For example, it is extremely difficult to translate a peak-power constraint 
(stating that at no epoch is the magnitude of the transmitted waveform allowed to 
exceed a given constant) to a statement about the sequence that is used to represent 
the waveform. Similarly, in Wireless Communications it is often very difficult to 
reduce the received waveform to a sequence without any loss in performance. 

The quest for mathematical precision can be demanding. I have therefore tried to 
precede the statement of every key theorem with its gist in plain English. Instruc- 
tors may well choose to present the material in class with less rigor and direct the 
students to the book for a more mathematical approach. I would rather have text- 
books be more mathematical than the lectures than the other way round. Having 
a rigorous textbook allows the instructor in class to discuss the intuition knowing 
that the students can obtain the technical details from the book at home. 

The communication problem comes with a beautiful geometric picture that I try 
to emphasize. To appreciate this picture one needs the definition of the inner 
product between energy-limited signals and some of the geometry of the space of 
energy-limited signals. These are therefore introduced early on in Chapters 3 and 4. 
Chapters 5 and 6 cover standard material from Linear Systems. But note the early 
introduction of the matched filter as a mechanism for computing inner products 
in Section 5.8. Also key is Parseval's Theorem in Section 6.2.2 which relates the 
geometric pictures in the time domain and in the frequency domain. 

Chapter 7 deals with passband signals and their baseband representation. We em- 
phasize how the inner product between passband signals is related to the inner 
product between their baseband representations. This elegant geometric relation- 
ship is often lost in the haze of various trigonometric identities. While this topic is 
important in wireless applications, it is not always taught in a first course in Digital 
Communications. Instructors who prefer to discuss baseband communication only 
can skip Chapters 7, 9, 16, 17, 18, 24 27, and Sections 26.10 and 28.5. But it would 
be a shame. 

Chapter 8 presents the celebrated Sampling Theorem from a geometric perspective. 
It is inessential to the rest of the book but is a striking example of the geometric 
approach. Chapter 9 discusses the Sampling Theorem for passband signals. 
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Chapter 10 discusses modulation. I have tried to motivate Linear Modulation 
and Pulse Amplitude Modulation and to minimize the use of the "that's just how 
it is done" argument. The use of the Matched Filter for detecting (here in the 
absence of noise) is emphasized. This also motivates the Nyquist Theory, which is 
treated in Chapter 11. I stress that the motivation for the Nyquist Theory is not 
to avoid inter-symbol interference at the sampling points but rather to guarantee 
the orthogonality of the time shifts of the pulse shape by integer multiples of the 
baud period. This ultimately makes more engineering sense and leads to cleaner 
mathematics: compare Theorem 11.3.2 with its corollary, Corollary 11.3.4. 

The result of modulating random bits is a stochastic process, a concept which is 
first encountered in Chapter 10; formally defined in Chapter 12; and revisited in 
Chapters 13, 17, and 25. It is an important concept in Digital Communications, 
and I find it best to first introduce man-made synthesized stochastic processes 
(as the waveforms produced by an encoder when fed random bits) and only later 
to introduce the nature-made stochastic processes that model noise. Stationary 
discrete-time stochastic processes are introduced in Chapter 13 and their complex 
counterparts in Chapter 17. These are needed for the analysis in Chapter 14 of the 
power in Pulse Amplitude Modulation and for the analysis in Chapter 17 of the 
power in Quadrature Amplitude Modulation. 

I emphasize that power is a physical quantity that is related to the time-averaged 
energy in the continuous-time transmitted power. Its relation to the power in the 
discrete-time modulating sequence is a nontrivial result. In deriving this relation 
I refrain from adding random timing jitters that are often poorly motivated and 
that turn out to be unnecessary. (The transmitted power does not depend on the 
realization of the fictitious jitter.) The Power Spectral Density in Pulse Amplitude 
Modulation and Quadrature Amplitude Modulation is discussed in Chapters 15 
and 18. The discussion requires a definition for Power Spectral Density for non- 
stationary processes (Definitions 15.3.1 and 18.4.1) and a proof that this definition 
coincides with the classical definition when the process is wide-sense stationary 
(Theorem 25.14.3). 

Chapter 19 opens the second part of the book, which deals with noise and detection. 
It introduces the univariate Gaussian distribution and some related distributions. 
The principles of Detection Theory are presented in Chapters 20-22. I emphasize 
the notion of Sufficient Statistics, which is central to Detection Theory. Building 
on Chapter 19, Chapter 23 introduces the all- important multivariate Gaussian 
distribution. Chapter 24 treats the complex case. 

Chapter 25 deals with continuous-time stochastic processes with an emphasis on 
stationary Gaussian processes, which are often used to model the noise in Digital 
Communications. This chapter also introduces white Gaussian noise. My approach 
to this topic is perhaps new and is probably where this text differs the most from 
other textbooks on the subject. 

I define white Gaussian noise of double-sided power spectral density No/2 
with respect to the bandwidth W as any measurable, 1 stationary, Gaussian 
stochastic process whose power spectral density is a nonnegative, symmetric, inte- 



1 This book does not assume any Measure Theory and does not teach any Measure Theory. 
(I do define sets of Lebesgue measure zero in order to be able to state uniqueness theorems.) I 
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Figure 1: The power spectral density of a white Gaussian noise process of double- 
sided power spectral density Nq/2 with respect to the bandwidth W. 



grable function of frequency that is equal to N /2 at all frequencies / satisfying 
|/| 5= VV. The power spectral density at other frequencies can be arbitrary. An 
example of the power spectral density of such a process is depicted in Figure 1. 
Adopting this definition has a number of advantages. The first is, of course, that 
such processes exist. One need not discuss "generalized processes," Gaussian pro- 
cesses with infinite variances (that, by definition, do not exist), or introduce the 
Ito calculus to study stochastic integrals. (Stochastic integrals with respect to the 
Brownian motion are mathematically intricate and physically unappealing. The 
idea of the noise having infinite power is ludicrous.) The above definition also frees 
me from discussing Dirac's Delta, and, in fact, Dirac's Delta is never used in this 
book. (A rigorous treatment of Generalized Functions is beyond the engineering 
curriculum in most schools, so using Dirac's Delta always gives the reader the 
unsettling feeling of being on unsure footing.) 

The detection problem in white Gaussian noise is treated in Chapter 26. No course 
in Digital Communications should end without Theorem 26.4.1. Roughly speak- 
ing, this theorem states that if the mean-signals are bandlimited to W Hz and if 
the noise is white Gaussian noise with respect to the bandwidth W, then the inner 
products between the received signal and the mean-signals form a sufficient statis- 
tic. Numerous examples as well as a treatment of colored noise are also discussed 
in this chapter. Extensions to noncoherent detection are addressed in Chapter 27 
and implications for Pulse Amplitude Modulation and for Quadrature Amplitude 
Modulation in Chapter 28. 

The book concludes with Chapter 29, which introduces Coding. It emphasizes how 
the code design influences the transmitted power, the transmitted power spectral 
density, the required bandwidth, and the probability of error. The construction of 
good codes is left to texts on Coding Theory. 



use Measure Theory only in stating theorems that require measurability assumptions. This is 
in line with my attempt to state theorems together with all the assumptions that are required 
for their validity. I recommend that students ignore measurability issues and just make a mental 
note that whenever measurability is mentioned there is a minor technical condition lurking in the 
background. 
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Basic Latin 

Mathematics sometimes reads like a foreign language. I therefore include here a 
short glossary for such terms as "i.e.," "that is," "in particular" "a fortiori" u for 
example" and "e.g.," whose meaning in Mathematics is slightly different from the 
definition you will find in your English dictionary. In mathematical contexts these 
terms are actually logical statements that the reader should verify. Verifying these 
statements is an important way to make sure that you understand the math. 

What are these logical statements? First note the synonym "i.e." = "that is" and 
the synonym "e.g." = u for example." Next note that the term "that is" often 
indicates that the statement following the term is equivalent to the one preceding 
it: "We next show that p is a prime, i.e., that p is a positive integer that is not 
divisible by any number other than one and itself." The terms "in particular" 
or "a fortiori" indicate that the statement following them is implied by the one 
preceding them: "Since <?(•) is differentiable and, a fortiori, continuous, it follows 
from the Mean Value Theorem that the integral of g(-) over the interval [0, 1] is 
equal to g(£) for some £ G [0,1]." The term u for example" can have its regular 
day-to-day meaning but in mathematical writing it also sometimes indicates that 
the statement following it implies the one preceding it: "Suppose that the function 
g(-) is monotonically nondecreasing, e.g., that it is differentiable with a nonnegative 
derivative." 

Another important word to look out for is "indeed," which in this book typically 
signifies that the statement just made is about to be expanded upon and explained. 
So when you read something that is unclear to you, be sure to check whether the 
next sentence begins with the word "indeed" before you panic. 

The Latin phrases "a priori" and "a posteriori" show up in Probability Theory. 
The former is usually associated with the unconditional probability of an event and 
the latter with the conditional. Thus, the "a priori" probability that the sun will 
shine this Sunday in Zurich is 25%, but now that I know that it is raining today, 
my outlook on life changes and I assign this event the a posteriori probability of 
15%. 

The phrase "prima facie" is roughly equivalent to the phrase "before any further 
mathematical arguments have been presented." For example, the definition of the 
projection of a signal v onto the signal u as the vector w that is collinear with u and 
for which v — w is orthogonal to u, may be followed by the sentence: "Prima facie, 
it is not clear that the projection always exists and that it is unique. Nevertheless, 
as we next show, this is the case." 



Syllabuses or Syllabi 

The book can be used as a textbook for a number of different courses. For a course 
that focuses on deterministic signals one could use Chapters 1-9 & Chapter 11. 
A course that covers Stochastic Processes and Detection Theory could be based 
on Chapter 12 and Chapters 19-26 with or without discrete-time stochastic pro- 
cesses (Chapter 13) and with or without complex random variables and processes 



xxii Preface 

(Chapters 17 & 24). 

For a course on Digital Communications one could use the entire book or, if time 
does not permit it, discuss only baseband communication. In the latter case one 
could omit Chapters 7, 9, 16, 17, 18, 24, 27, and Section 28.5, 

The dependencies between the chapters are depicted on Page xxiii. 

A web page for this book can be found at 

www. af oundationindigitalcommunication. ethz. ch 
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Chapter 1 



Some Essential Notation 



Reading a whole chapter about notation can be boring. We have thus chosen to 
collect here only the essentials and to introduce the rest when it is first used. The 
"List of Symbols" on Page 704 is more comprehensive. 

We denote the set of complex numbers by C, the set of real numbers by R, the set 
of integers by Z, and the set of natural numbers (positive integers) by N. Thus, 

N = {neZ : n > 1}. 

The above equation is not meant to belabor the point. We use it to introduce the 
notation 



{x € A : statement} 



for the set consisting of all those elements of the set A for which "statement" holds. 

In treating real numbers, we use the notation (a, b), [a, b), [a, 6], (a, b] to denote 
open, half open on the right, closed, and half open on the left intervals of the real 
line. Thus, for example, 

[a, b) = {x e R : a < x < b}. 

A statement followed by a comma and a condition indicates that the statement 
holds whenever the condition is satisfied. For example, 

\a n — a\ < e, n > riQ 

means that \a n — a\ < e whenever n > uq. 

We use Ijstatement} to denote the indicator of the statement. It is equal to 1, if 
the statement is true, and it is equal to 0, if the statement is false. Thus 



I{statement} 



1 if statement is true, 
if statement is false. 



2 Some Essential Notation 

In dealing with complex numbers we use i to denote the purely imaginary unit- 
magnitude complex number 

We use z* to denote the complex conjugate of z, we use Re(z) to denote the real 
part of z, we use Im(z) to denote the imaginary part of z, and we use \z\ to denote 
the absolute value (or "modulus" , or "complex magnitude" ) of z. Thus, if z = a+\b, 
where a, b £ R, then z* = a — \b, Re(z) = a, lm(z) = b, and \z\ = y/a 2 + b 2 . 

The notation used to define functions is extremely important and is, alas, some- 
times confusing to students, so please pay attention. A function or a mapping 
associates with each element in its domain a unique element in its range. If a 
function has a name, the name is often written in bold as in u. 1 Alternatively, we 
sometimes denote a function u by «(•). The notation 

u:A^B 

indicates that u is a function of domain A and range B. The rule specifying for 
each element of the domain the element in the range to which it is mapped is often 
written to the right or underneath. Thus, for example, 

u: R-» (-5,oo), t^t 2 

indicates that the domain of the function u is the reals, that its range is the set 
of real numbers that exceed —5, and that u associates with t the nonnegative 
number t 2 . We write u(t) for the result of applying the mapping u to t. The 
image of a mapping u : A — » B is the set of all elements of the range B to which 
at least one element in the domain is mapped by u: 

(image of u: A — > Bj = {u(x) : x e A}. (1.1) 

The image of a mapping is a subset of its range. In the above example, the image 
of the mapping is the set of nonnegative reals [0, 00). A mapping u : A — ► B is said 
to be onto (or surjective) if its image is equal to its range. Thus, u : A — » B is 
onto if, and only if, for every y € B there corresponds some x G A (not necessarily 
unique) such that u(x) = y. If the image of g(-) is a subset of the domain of 
h(-), then the composition of <?(•) and h(-) is the mapping x 1— > h(g(x)), which is 
denoted by hog. 

Sometimes we do not specify the domain and range of a function if they are clear 
from the context. Thus, we might write u: t m v(t) cos(2ir f c t) without making 
explicit what the domain and range of u are. In fact, if there is no need to give a 
function a name, then we will not. For example, we might write t 1— > v(t) cos(2ir f c t) 
to designate the unnamed function that maps t to v(t) cos(2ir f c t). (Here v(-) is 
some other function, which was presumably defined before.) 

If the domain of a function u is K and if the range is K, then we sometimes say 
that u is a real- valued signal or a real signal, especially if the argument of u 



x But some special functions such as the self-similarity function R gg , the autocovariance func- 
tion Kxx , and the power spectral density Sxx , which will be introduced in later chapters, are 
not in boldface. 
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stands for time. Similarly we shall sometimes refer to a function u: R — > C as a 
complex- valued signal or a complex signal. If we refer to u as a signal, then 
the question whether it is complex-valued or real- valued should be clear from the 
context, or else immaterial to the claim. 

We caution the reader that, while u and u(-) denote functions, u(t) denotes the 
result of applying u to t. If u is a real-valued signal then u(t) is a real number! 

Given two signals u and v we define their superposition or sum as the signal 
1 1— > u(t) + v(t). We denote this signal by u + v. Also, if a € C and u is any signal, 
then we define the amplification of u by a as the signal t t— ► au(t). We denote 
this signal by au. Thus, 

au + /3v 

is the signal 

t h-> au{t) +/3v(t). 

We refer to the function that maps every element in its domain to zero as the all- 
zero function and we denote it by 0. The all-zero signal maps every ( 6 R 
to zero. If x: R — > C is a signal that maps every t € R to x(t), then its reflection 
or mirror image is denoted by x and is the signal that is defined by 

x: t i— > x(—t). 

Dirac's Delta (which will hardly be mentioned in this book) is not a function. 

A probability space is defined as a triplet (il,jF, P), where the set £7 is the set of 
experiment outcomes, the elements of the set T are subsets of £7 and are called 
events, and where P: T — » [0, 1] assigns probabilities to the various events. It is 
assumed that T forms a cr-algebra, i.e., that (Iff; that if a set is in T then so 
is its complement (with respect to O); and that every finite or countable union of 
elements of T is also an element of T . A random variable X is a mapping from $7 
to R that satisfies the technical condition that 

{wen:lH<(}ef, £eR. (1.2) 

This condition guarantees that it is always meaningful to evaluate the probability 
that the value of X is smaller or equal to £. 



Chapter 2 

Signals, Integrals, and Sets of Measure Zero 

2.1 Introduction 

The purpose of this chapter is not to develop the Lebesgue theory of integration. 
Mastering this theory is not essential to understanding Digital Communications. 
But some concepts from this theory are needed in order to state the main results 
of Digital Communications in a mathematically rigorous way. In this chapter 
we introduce these required concepts and provide references to the mathematical 
literature that develops them. 

The less mathematically-inclined may gloss over most of this chapter. Readers 
who interpret the integrals in this book as Riemann integrals; who interpret "mea- 
surable" as "satisfying a minor mathematical restriction" ; who interpret "a set of 
Lebesgue measure zero" as "a set that is so small that integrals of functions are 
not sensitive to the value the integrand takes in this set" ; and who swap orders of 
summations, expectations and integrations fearlessly will not miss any engineering 
insights. 

But all readers should pay attention to the way the integral of complex-valued 
signals is defined (Section 2.3); to the basic inequality (2.13); and to the notation 
introduced in (2.6). 

2.2 Integrals 

Recall that a real- valued signal u is a function u : M. —* K. The integral of u is 
denoted by 

/oo 
u{t)dt. (2.1) 

-oo 

For (2.1) to be meaningful some technical conditions must be met. (You may re- 
call from your calculus studies, for example, that not every function is Riemann 
integrable.) In this book all integrals will be understood to be Lebesgue integrals, 
but nothing essential will be lost on readers who interpret them as Riemann inte- 
grals. For the Lebesgue integral to be defined the integrand u must be a Lebesgue 
measurable function. Again, do not worry if you have not studied the Lebesgue 



2.3 Integrating Complex-Valued Signals 



integral or the notion of measurable functions. We point this out merely to cover 
ourselves when we state various theorems. Also, for the integral in (2.1) to be 
defined we insist that 



\u(t)\dt < oo. 



(2.2) 



(There are ways of defining the integral in (2.1) also when (2.2) is violated, but 
they lead to fragile expressions that are difficult to manipulate.) 

A function u : R — > R which is Lebesgue measurable and which satisfies (2.2) is 
said to be integrable, and we denote the set of all such functions by C i . We shall 
refrain from integrating functions that are not elements of Ci . 

2.3 Integrating Complex-Valued Signals 



This section should assuage your fear of integrating complex- valued signals. (Some 
of you may have a trauma from your Complex Analysis courses where you dealt 
with integrals of functions from the complex plane to the complex plane. Here 
things are much simpler because we are dealing only with integrals of functions 
from the real line to the complex plane.) We formally define the integral of a 
complex- valued function u : R — > C by 



(2.3) 



For this to be meaningful, we require that the real functions t i— ► Re («(£)) and 
t i— ► lm(u(t)) both be integrable real functions. That is, they should both be 
Lebesgue measurable and we should have 



/>CO 

J — oo 


u(t)dt 


-/ 

J — oo 


Re(u(t)) 


dt + 


/•OO 

7 

J — oo 


Im(u(£)) 


At. 



Re(w(i)) \At < oo and 



\lm(u(t))\dt < 



(2.4) 



It is not difficult to show that (2.4) is equivalent to the more compact condition 

|«(i)|di<oo. (2.5) 



We say that a complex signal u : R — » C is Lebesgue measurable if the mappings 
1 1— » Ke(u(t)j and 1 1— > Im(u(t)j are Lebesgue measurable real signals. We say that 
a function u: R — > C is integrable if it is Lebesgue measurable and (2.4) holds. 
The set of all Lebesgue measurable integrable complex signals is denoted by Ci . 
Note that we use the same symbol Ci to denote both the set of integrable real 
signals and the set of integrable complex signals. To which of these two sets we 
refer should be clear from the context, or else immaterial. 



For u££ 



i we 



define llull , as 




(2.6) 



6 Signals, Integrals, and Sets of Measure Zero 

Before summarizing the key properties of the integral of complex signals we remind 
the reader that if u and v are complex signals and if a, (3 are complex numbers, then 
the complex signal au + /3v is defined as the complex signal t t— * au(t)+(3v(t). The 
intuition for the following proposition comes from thinking about the integrals as 
Riemann integrals, which can be approximated by finite sums and by then invoking 
the analogous results about finite sums. 

Proposition 2.3.1 (Properties of Complex Integrals). Let the complex signals u, v 
be in Ci, and let a, f3 be arbitrary complex numbers. 

(i) Integration is linear in the sense that a\i + /3v € £i and 

/oo /*oo 

u{t)dt + j3 I v{t)dt. (2.7) 

-oo J — oo 

(ii) Integration commutes with complex conjugation 

u*(t)dt= ( / u(t)dt) . (2.8) 

> W— oo / 

(Hi) Integration commutes with the operation of taking the real part 

Re( u{t)dt)= Re(w(i))di. (2.9) 

\«/— oo / J — oo 

(iv) Integration commutes with the operation of taking the imaginary part 

Im( / u{t)dt)= Im(u(t))dt. (2.10) 

\ J — oo / J —oo 

Proof. For a proof of (i) see, for example, (Rudin, 1974, Theorem 1.32). The rest 
of the claims follow easily from the definition of the integral of a complex- valued 
signal (2.3). □ 

2.4 An Inequality for Integrals 

Probably the most important inequality for complex numbers is the Triangle 
Inequality for Complex Numbers 

\w + z\< \w\ + \z\, w,ze C. (2.11) 

This inequality extends by induction to finite sums: 

n n 

$>J <EM> *!,•■•,*«€ C. (2.12) 



i=i 



i=i 



The extension to integrals is the most important inequality for integrals: 
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Proposition 2.4.1. For every complex-valued or real-valued signal u in Ci 

\u(t)\dt. (2.13) 



/>OG 




fOC 


/ u{t)dt 


< 


/ 


J — CO 


j 


'-0O 



Proof. See, for example, (Rudin, 1974, Theorem 1.33). D 

Note that in (2.13) we should interpret | ■ | as the absolute- value function if u is a 
real signal, and as the modulus function if u is a complex signal. 

Another simple but useful inequality is 

IIu + vII^IIuIIj + HvII,, u,ve£i, (2.14) 

which can be proved using the calculation 



\u(t) + v(t)\dt 

<l (|«(t)| + Kt)|)dt 

/•OO 

\u(t)\dt+ / \v{t)\dt 



— dc 



dc 

uL + llvl 



where the inequality follows by applying the Triangle Inequality for Complex Num- 
bers (2.11) with the substitution of u(t) for w and v(t) for z. 



2.5 Sets of Lebesgue Measure Zero 

It is one of life's minor grievances that the integral of a nonnegative function can 
be zero even if the function is not identically zero. For example, t <— » I{£ = 17} is a 
nonnegative function whose integral is zero and which is nonetheless not identically 
zero (it maps 17 to one). In this section we shall derive a necessary and sufficient 
condition for the integral of a nonzero function to be zero. This condition will 
allow us later to state conditions under which various integral inequalities hold 
with equality. It will give mathematical meaning to the physical intuition that if 
the waveform describing some physical phenomenon (such as voltage over a resistor) 
is nonnegative and integrates to zero then "for all practical purposes" the waveform 
is zero. 

We shall define sets of Lebesgue measure zero and then show that a nonnegative 
function u : R — » [0, oo) integrates to zero if, and only if, the set {t G R : u(t) > 0} 
is of Lebesgue measure zero. We shall then introduce the notation u = v to indicate 
that the set {t G R : u(t) ^ v(t)} is of Lebesgue measure zero. 

It should be noted that since the integral is unaltered when the integrand is changed 
at a finite (or countable) number of points, it follows that any nonnegative function 
that is zero except at a countable number of points integrates to zero. The reverse, 



8 Signals, Integrals, and Sets of Measure Zero 

however, is not true. One can find nonnegative functions that integrate to zero 
and that are nonzero on an uncountable set of points. 

The less mathematically inclined readers may skip the mathematical definition of 
sets of measure zero and just think of a subset of the real line as being of Lebesgue 
measure zero if it is so "small" that the integral of any function is unaltered when 
the values it takes in the subset are altered. Such readers should then think of the 
statement u = v as indicating that u — v is just the result of altering the all-zero 
signal on a set of Lebesgue measure zero and that, consequently, 



\u(t) -v(t)\dt = 0. 

Definition 2.5.1 (Sets of Lebesgue Measure Zero). We say that a subset N of 

the real line R is a set of Lebesgue measure zero (or a Lebesgue null set) 
if for every e > we can find a sequence of intervals [oi, 6i], [02, 62], • ■ • such that 
the total length of the intervals is smaller than or equal to e 

00 

^2(bj - aj) < e (2.15a) 

i=i 

and such that the union of the intervals cover the set Af 

JVC [01,61] U [02,62] U-- . (2.15b) 

As an example, note that the set {1} is of Lebesgue measure zero. Indeed, it is 
covered by the single interval [1 — e/2, 1 + e/2] whose length is e. Similarly, any 
finite set is of Lebesgue measure zero. Indeed, the set {ai, . . . , a„} can be covered 
by n intervals of total length not exceeding e as follows: 

{«i, • ■ • ,a„} c [ai - e/(2n),ai + e/(2n)] U • • • U [a„ - e/(2n),a„ + e/(2n)]. 

This argument can be also extended to show that any countable set is of Lebesgue 
measure zero. Indeed the countable set {c<i, a.2, ■ ■ •} can be covered as 

DC 

{«i, a 2 , . . .} C |J [aj - 2-3- 1 e, ctj + 2- j - 1 e] 

where we note that the length of the interval \a.j — 2 -J ' -1 e, ay + 2~ J ~ 1 e] is 2~ J e, 
which when summed over j yields e. 

With a similar argument one can show that the union of a countable number of 
sets of Lebesgue measure zero is of Lebesgue measure zero. 

The above examples notwithstanding, it should be emphasized that there exist sets 
of Lebesgue measure zero that are not countable. 1 Thus, the concept of a set of 
Lebesgue measure zero is different from the concept of a countable set. 

Loosely speaking, we say that two signals are indistinguishable if they agree except 
possibly on a set of Lebesgue measure zero. We warn the reader, however, that 
this terminology is not standard. 



x For example, the Cantor set is of Lebesgue measure zero and uncountable; see (Rudin, 1976, 
Section 11.11, Remark (f), p. 309). 
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Definition 2.5.2 (Indistinguishable Functions). We say that the Lebesgue measur- 
able functions u,v from R to C (or to M.) are indistinguishable and write 



if the set {t G R : u(t) ^ v(t)} is of Lebesgue measure zero. 

Note that u = v if, and only if, the signal u — v is indistinguishable from the 
all-zero signal 

(u = v) O (u- v = 0). (2.16) 

The main result of this section is the following: 
Proposition 2.5.3. 

(i) A nonnegative Lebesgue measurable signal integrates to zero if, and only if 
it is indistinguishable from the all-zero signal 0. 

(ii) //u,v are Lebesgue measurable functions from R to C (or to M.), then 

\u(t)-v(t)\dt = Q\& (u = v) (2.17) 

and 



\u{t)- v{t)\ 2 dt = o) O (u = vV (2.18) 

(Hi) Lf u and v are integrable and indistinguishable, then their integrals are equal: 

p oo /*oo 

fuEv)=>(/ u{t)dt= v(t)dt), u,ved. (2.19) 



Proof. The proof of (i) is not very difficult, but it requires more familiarity with 
Measure Theory than we are willing to assume. The interested reader is thus 
referred to (Rudin, 1974, Theorem 1.39). 

The equivalence in (2.17) follows by applying Part (i) to the nonnegative function 
t t— > \u(t) — v(t)\. Similarly, (2.18) follows by applying Part (i) to the nonnegative 
function 1 1— * \u(t)—v(t)\ 2 and by noting that the set of i's for which \u(t)—v(t)\ 2 ^ 
is the same as the set of t's for which u(t) ^ v(t). 

Part (iii) follows from (2.17) by noting that 

/OO f'OO 

v{t)dt = / (u(t) -v{t))dt 
-oo J —oo 

/CO 
\u(t) -v(t)\dt, 
-oo 

where the first equality follows by the linearity of integration, and where the sub- 
sequent inequality follows from Proposition 2.4.1. □ 
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2.6 Swapping Integration, Summation, and Expectation 

In numerous places in this text we shall swap the order of integration as in 

u(a, P) da) dp = [ ([ u{a,P)dp) da (2.20) 

or the order of summation as in 



oo / /*oo 



— oo \<J — oo / >J — oo \<J — oo 



OO / OO \ OO / oo 



E E«^) = E E a ^J ( 2 - 21 ) 

or the order of summation and integration as in 

/oo / °° \ °° / />00 \ 

[Y j a v u v (t)\dt = Y^\a v \ u v (t)dt) (2.22) 

or the order of integration and expectation as in 

/OO rOG t>00 

Xu{t)dt = E[Xu{t)} dt = E[X] u(t)dt. 

-oo J •'— oo J— oo 

These changes of order are usually justified using Fubini's Theorem, which states 
that these changes of order are permissible provided that a very technical measura- 
bility condition is satisfied and that, in addition, either the integrand is nonnegative 
or that in some order (and hence in all orders) the integrals/summation/expectation 
of the absolute value of the integrand is finite. 

For example, to justify (2.20) it suffices to verify that the function u : M 2 — > M. in 
(2.20) is Lebesgue measurable and that, in addition, it is either nonnegative or 



OO / /'CO 



\u(a, P) | da ) dp < oo 



OO \'J — oo 



OO / I'OQ 



\u(a, P)\ dp I da < oo. 

— oo W — oo / 

Similarly, to justify (2.21) it suffices to show that a v>r) > or that 

e(x> 

17=1 X v=\ 

or that 

OO / oo 



E( Ei a ^i) <o °- 

(No need to worry about measurability which is automatic in this setup.) 



2.7 Additional Reading 11 

As a final example, to justify (2.22) it suffices that the functions {u„} are all 
measurable and that either a u u v (i) is nonnegative for all v € N and telor 



^M K0)| ) dt < x: 

x / />oo 

v=\ \J-oo 



or 

,{t)\At) < oo. 



A precise statement of Fubini's Theorem requires some Measure Theory that is 
beyond the scope of this book. The reader is referred to (Rudin, 1974, Theorem 
7.8) and (Billingsley, 1995, Chapter 3, Section 18) for such a statement and for a 
proof. 

We shall frequently use the swapping-of-order argument to manipulate the square 
of a sum or the square of an integral. 

Proposition 2.6.1. 

(i) tfY.v \ a A < °° then 

oo \ 2 oo oo 

(ii) If u is an integrable real-valued or complex-valued signal, then 

\ 2 «oo t>oo 

u(a)da) = / u(a)u(a')dada'. (2.24) 



oo <f — oo 



Proof. The proof is a direct application of Fubini's Theorem. But ignoring the 
technicalities, the intuition is quite clear: it all boils down to the fact that (a+ b) 2 
can be written as (a + b)(a+b), which can in turn be written as aa+ab+ba+bb. □ 



2.7 Additional Reading 

Numerous books cover the basics of Lebesgue integration. Classic examples are 
(Riesz and Sz.-Nagy, 1990), (Rudin, 1974) and (Royden, 1988). These texts also 
cover the notion of sets of Lebesgue measure zero, e.g., (Riesz and Sz.-Nagy, 
1990, Chapter 1, Section 2). For the changing of order of Riemann integration 
see (Korner, 1988, Chapters 47 & 48). 



2.8 Exercises 



Exercise 2.1 (Integrating an Exponential). Show that 

i 

e~ zt At= -, Re(z) > 0. 
o z 
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Exercise 2.2 (Triangle Inequality for Complex Numbers). Prove the Triangle Inequality 
for complex numbers (2.11). Under what conditions does it hold with equality? 

Exercise 2.3 (When Are Complex Numbers Equal?). Prove that if the complex numbers 
w and z are such that Re(/3z) = Re([3w) for all /? £ C, then w = z. 

Exercise 2.4 (An Integral Inequality). Show that if u, v, and w are integrable signals, 
then 

/>oo />oo />oo 

/ \u(t) - w(t)\dt < \u(t) - v(t)\dt+ / \v(i)-w(t)\dt. 

J —aci J —oo -J — oci 

Exercise 2.5 (An Integral to Note). Given some /el, compute the integral 

I{t= 17}e- i27r/t di. 



Exercise 2.6 (Subsets of Sets of Lebesgue Measure Zero). Show that a subset of a set 
of Lebesgue measure zero must also be of Lebesgue measure zero. 

Exercise 2.7 (Nonuniqueness of the Probability Density Function). We say that the 
random variable X is of density fx(-) if fx(-) is a (Lebesgue measurable) nonnegative 
function such that 

Pr[X <x] = [ /x (£)<!£, i6l. 

J — oo 

Show that if X is of density fx(-) and if g(-) is a nonnegative function that is indistin- 
guishable from fx(-), then X is also of density g(-). (The reverse is also true: if X is of 
density gi(-) and also of density g2(-), then <?i(-) and <72(-) must be indistinguishable.) 

Exercise 2.8 (Indistinguishability). Let i/> : K 2 -> R satisfy ip{a,f3) > 0, for all a,/3 e E 
with equality only if a = (3. Let u and v be Lebesgue measurable signals. Show that 

%j){u{t),v(t)) At = 



Exercise 2.9 (Indistinguishable Signals). Show that if the Lebesgue measurable signals g 
and h are indistinguishable, then the set of epochs (Gl where the sums X^°!L_oo fl(* + i) 
and X^jl-oo M* + J) are different (in the sense that they both converge but to different 
limits or that one converges but the other does not) is of Lebesgue measure zero. 

Exercise 2.10 (Continuous Nonnegative Functions). A subset of R containing a nonempty 
open interval cannot be of Lebesgue measure zero. Use this fact to show that if a con- 
tinuous function g : R — » R is nonnegative except perhaps on a set of Lebesgue measure 
zero, then the exception set is empty and the function is nonnegative. 

Exercise 2.11 (Order of Summation Sometimes Matters). For every i/,tj6N define 

'2-2"" iiv^T) 

-2 + 2"" iiv = i} + \ 
^ otherwise. 

Show that (2.21) is not satisfied. See (Royden, 1988, Chapter 12, Section 4, Exercise 24.). 
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Exercise 2.12 (Using Fubini's Theorem). Using the relation 

1 f°° 

- = / e xt At, x > 
x Jo 

and Fubini's Theorem, show that 

f a sinx n 

hm / ax — — . 

a^ooJ Q X 2 

See (Rudin, 1974, Chapter 7, Exercise 12). 
Hint: See also Problem 2.1. 



Chapter 3 

The Inner Product 

3.1 The Inner Product 

The inner product is central to Digital Communications, so it is best to introduce 
it early. The motivation will have to wait. 

Recall that u : A — » B indicates that u (sometimes denoted «(•)) is a function 
(or mapping) that maps each element in its domain A to an element in its 
range B. If both the domain and the range of u are the set of real numbers R, 
then we sometimes refer to u as being a real signal, especially if the argument of 
u(-) stands for time. Similarly, if u: M. — > C where C denotes the set of complex 
numbers and the argument of «(•) stands for time, then we sometimes refer to u 
as a complex signal. 

The inner product between two real functions u : R — > R and v : R — * R is 
denoted by (u, v) and is defined as 



(u,v) = / u(t)v(t)dt, (3.1) 

J — oo 

whenever the integral is defined. (In Section 3.2 we shall study conditions un- 
der which the integral is defined, i.e., conditions on the functions u and v that 
guarantee that the product function t t— > u(t)v(t) is an integrable function.) 

The signals that arise in our study of Digital Communications often represent 
electric fields or voltages over resistors. The energy required to generate them is 
thus proportional to the integral of their squared magnitude. This motivates us to 
define the energy of a Lebesgue measurable real- valued function u : R — > R as 

u 2 (t)dt. 

(If this integral is not finite, then we say that u is of infinite energy.) We say that 
u : R — » R is of finite energy if it is Lebesgue measurable and if 

u 2 (t)dt < oo. 

14 
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The class of all finite-energy real- valued functions u : 



is denoted by L^ 



Since the energy of u: M. — » M. is nonnegative, we can discuss its nonnegative square 
root, which we denote 1 by ||u|| 2 : 



u 2 {t)dt. 



(3.2) 



(Throughout this book we denote by \/£ the nonnegative square root of £ for every 
£ > 0.) We can now express the energy in u using the inner product as 



u 2 {t)dt 
(u,u). 



(3.3) 



In writing ||u|L above we used different fonts for the subscript and the superscript. 



The subscript is just a graphical character which is part of the notation 



We 



could have replaced it with ♦ and designated the energy by ||u||l without any 
change in mathematical meaning. 2 The superscript, however, indicates that the 
quantity ||u|| 2 is being squared. 

For complex- valued functions u : K — » C and v:l^Cwe define the inner product 
(u,v) by 



(3.4) 



whenever the integral is defined. Here V*(t) denotes the complex conjugate of v(t). 
The above integral in (3.4) is a complex integral, but that should not worry you: 
it can also be written as 





/>CO 


(u,v)4 


/ u{t)v*(t)dt, 




' —oo 



(u,v) 



Re(u(t) v* (t)) dt + \ / lm(u(t) v* (t)) dt 



(3.5) 



where i = \/—l and where Re(-) and Im(-) denote the functions that map a complex 
number to its real and imaginary parts: Re(a + \b) = a and Im(a+ \b) = b whenever 
a, b G R. Each of the two integrals appearing in (3.5) is the integral of a real signal. 
See Section 2.3. 

Note that (3.1) and (3.4) are in agreement in the sense that if u and v happen 
to take on only real values (i.e., satisfy that u(t),v(t) G M. for every t G K), then 
viewing them as real functions and thus using (3.1) would yield the same inner 
product as viewing them as (degenerate) complex functions and using (3.4). Note 
also that for complex functions u, v : K — > C the inner product (u, v) is in general 
not the same as (v, u). One is the complex conjugate of the other. 



1 The subscript 2 is here to distinguish ||u|| 2 

iiuL=nju(t)idt. 



from ||u|| j , where the latter was defined in (2.6) 



We prefer 



to 



because it reminds us that in the definition (3.2) the integrand is 



raised to the second power. This should be contrasted with the symbol ||-||j where the integrand 
is raised to the first power (and where no square root is taken of the result); see (2.6). 
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Some of the properties of the inner product between complex-valued functions 
u, v : K — > C are given below. 



(u, v) 


= (v,U>* 


(cm, v) 


= a(u, v), a G i 


(u, <xv) 


= a* (u, v), a € 


(ui + u 2 ,v) 


= (ui,v) + (u 2 ,v 


(u,vi + v 2 > 


= <u,vi) + (u,v 2 



(3.6) 

(3.7) 

(3.8) 

(3.9) 

(3.10) 



The above equalities hold whenever the inner products appearing on the right- 
hand side (RHS) are defined. The reader is encouraged to produce a similar list of 
properties for the inner product between real- valued functions u, v: R — > R. 

The energy in a Lebesgue measurable complex- valued function u : R — > C is de- 
fined as 

fOO 

\u{i)\ 2 dt, 



where |-| denotes absolute value so \a + i6| = Va 2 + b 2 whenever a, b G R. This 
definition of energy might seem a bit contrived because there is no such thing 
as complex voltage, so prima facie it seems meaningless to define the energy of 
a complex signal. But this is not the case. Complex signals are used to repre- 
sent real passband signals, and the representation is such that the energy in the 
real passband signal is proportional to the integral of the squared modulus of the 
complex-valued signal representing it; see Section 7.6 ahead. 



Definition 3.1.1 (Energy-Limited Signal). We say that u: I 
limited or of finite energy if u is Lebesgue measurable and 



is energy- 



\u(t)\ dt < oo. 



The set of all energy- limited complex- valued functions u: R — » C is denoted by £g. 
Note that whether £ 2 stands for the class of energy-limited complex- valued or real- 
valued functions should be clear from the context, or else immaterial. 

For every u G £g we define ||u|| 2 as the nonnegative square root of its energy 



||u|| 2 = V(u,u), 



(3.11) 




(3.12) 



Again (3.12) and (3.2) are in agreement in the sense that for every u: R — > R, 
computing ||u|| 2 via (3.2) yields the same result as if we viewed u as mapping 
from R to C and computed ||u|L via (3.12). 
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3.2 When Is the Inner Product Defined? 

As noted in Section 2.2, in this book we shall only discuss the integral of integrable 
functions, where a function u : R — > R is integrable if it is Lebesgue measurable 
and if J_ \u(t)\dt < oo. (We shall sometimes make an exception for functions 
that take on only nonnegative values. If u : R — » [0, oo) is Lebesgue measurable 
and if J u{t) dt is not finite, then we shall say that J u(t) dt = +oo.) 

Similarly, as in Section 2.3, in integrating complex signals u: R — > C we limit 
ourselves to signals that are integrable in the sense that both t i— » Re(u(i)) and 
1 1— > Im(u(t)) are Lebesgue measurable real-valued signals and f_ \u(t)\ dt < oo. 

Consequently, we shall say that the inner product between u : R — > C and v : R — > C 
is well-defined only when they are both Lebesgue measurable (thus implying that 
t 1— > u(t) v*(t) is Lebesgue measurable) and when 

\u(t)v(t)\dt < oo. (3.13) 

We next discuss conditions on the Lebesgue measurable complex signals u and v 
that guarantee that (3.13) holds. The simplest case is when one of the functions, 
say u, is bounded and the other, say v, is integrable. Indeed, if cr^ G R is such 
that \u(t)\ < (Too for all fef, then \u(t) v(t)\ < <7oo|v(i)| and 

/oo 
\v(t)\ dt = (Too \\y\\t , 
-oo 

where the RHS is finite by our assumption that v is integrable. 

Another case where the inner product is well-defined is when both u and v are of 
finite energy. To prove that in this case too the mapping t <—> u(t) v(t) is integrable 
we need the inequality 

a(3< -{a 2 + (3 2 ), a,/3eR, (3.14) 

which follows directly from the inequality (a — /3) 2 > by simple algebra: 

0< {a- f3f 

= o? + /3 2 - 2a(3. 

By substituting \u(t)\ for a and \v(t)\ for j3 in (3.14) we obtain the inequality 
\u(i) v(t)\ < (\u(t)\ 2 + \v{t)\ 2 )/2 and hence 

-1 /"OO -i f'OQ 

\u{t)v{t)\dt< - \u{t)\ 2 dt+- |«(t)| 2 d^, (3.15) 

^ J-oo ^ J -oo 

thus demonstrating that if both u and v are of finite energy (so the RHS is finite) , 
then the inner product is well-defined, i.e., t i— > u{t)v(t) is integrable. 

As a by-product of this proof we can obtain an upper bound on the magnitude of 
the inner product in terms of the energies of u and v. All we need is the inequality 



/(Od* 



1/(01 ^ 
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(see Proposition 2.4.1) to conclude from (3.15) that 



l(u,v>| 



< 



< 



u(t)v*(t)dt 
\u{t)\ \v{t)\dt 
\u(t)\ 2 dt + 



\v(t)\ At 



(3.16) 



This inequality will be improved in Theorem 3.3.1, which introduces the Cauchy- 
Schwarz Inequality. 

We finally mention here, without proof, a third case where the inner product 
between the Lebesgue measurable signals u,v is defined. The result here is that if 
for some numbers 1 < p, q < oo satisfying l/p + 1/q = 1 we have that 



\u(t)\ dt < oo and 



\v(i)\ dt < oo, 



then t t— > u(t) v(t) is integrable. The proof of this result follows from Holder's 
Inequality; see Theorem 3.3.2. Notice that the second case we addressed (where u 
and v are both of finite energy) follows from this case by considering p = q = 2. 

3.3 The Cauchy-Schwarz Inequality 

The Cauchy-Schwarz Inequality is probably the most important inequality on the 
inner product. Its discrete version is attributed to Augustin-Louis Cauchy (1789- 
1857) and its integral form to Victor Yacovlevich Bunyakovsky (1804-1889) who 
studied with him in Paris. Its (double) integral form was derived independently by 
Hermann Amandus Schwarz (1843-1921). See (Steele, 2004, pp. 10-12) for more 
on the history of this inequality and on how inequalities get their names. 

Theorem 3.3.1 (Cauchy-Schwarz Inequality). If the functions u,v: R — > C are 

of finite energy, then the mapping 1 1— » u(t) v*(t) is integrable and 



(3.17) 




That is, 



u(t)v*(t)dt 



< 



|w(£)| 2 dt 



\v{t)\ 2 dt. 



Equality in the Cauchy-Schwarz Inequality is possible, e.g., if u is a scaled version 
of v, i.e., if for some constant a 



u(t) = av(t), t £ 
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In fact, the Cauchy-Schwarz Inequality holds with equality if, and only if, either v(t) 
is zero for all t outside a set of Lebesgue measure zero or for some constant a we 
have u(t) = av(t) for all t outside a set of Lebesgue measure zero. 

There are a number of different proofs of this important inequality. We shall focus 
here on one that is based on (3.16) because it demonstrates a general technique for 
improving inequalities. The idea is that once one obtains a certain inequality — in 
our case (3.16) — one can try to improve it by taking advantage of one's under- 
standing of how the quantity in question is affected by various transformations. 
This technique is beautifully illustrated in (Steele, 2004). 

Proof. The quantity in question is |(u,v)|. We shall take advantage of our under- 
standing of how this quantity behaves when we replace u with its scaled version 
au and when we replace v with its scaled version /3v. Here a,/3eC are arbitrary. 
The quantity in question transforms as 

|(mi,/3v)| = H|/3||(u,v)|. (3.18) 

We now use (3.16) to upper-bound the left-hand side (LHS) of the above by sub- 
stituting au and /3v for u and v in (3.16) to obtain 

Ml/31 |<u,v)| = |(au,/3v)| 

<^M 2 H| 2 2 + i|/3| 2 ||v|| 2 2 , a,(3eC. (3.19) 

If both ||u|| 2 and ||v|| 2 are positive, then (3.17) follows from (3.19) by choosing 
a = 1/ || u || 2 and [3 = 1/ ||v||g. To conclude the proof it thus remains to show that 
(3.17) also holds when either ||u|| 2 or ||v|| 2 is zero so the RHS of (3.17) is zero. 
That is, we need to show that if either ||u|| 2 or ||v|| 2 is zero, then (u, v) must also 
be zero. To show this, suppose first that ||u|| 2 is zero. By substituting a = 1 in 
(3.19) we obtain in this case that 

|/3||<u,v)|<;V||v|| 2 , 



2 



which, upon dividing by |/3|, yields 



l(u,v)|<i|/3|||v|| 2 2 , f}±Q. 

Upon letting \(3\ tend to zero from above this demonstrates that (u, v) must be zero 
as we set out to prove. (As an alternative proof of this case one notes that ||u|| 2 = 
implies, by Proposition 2.5.3, that the set {t€R: u{i) ^ 0} is of Lebesgue measure 
zero. Consequently, since every zero of t t— > u(t) is also a zero of t i— » u(t)v*(t), 
it follows that {t £ R : u(t)v*(t) ^ 0} is included in {t £ E : u{t) ^ 0}, and 
must therefore also be of Lebesgue measure zero (Exercise 2.6). Consequently, by 
Proposition 2.5.3, /_ \u(t) v* (t)\ dt must be zero, which, by Proposition 2.4.1, 
implies that |(u, v)| must be zero.) 

The case where ||v|L = is very similar: by substituting f3 = 1 in (3.19) we obtain 
that (in this case) 



|<u,v)|<-H||u|| 2 , a/0 
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and the result follows upon letting |a| tend to zero from above. □ 

While we shall not use the following inequality in this book, it is sufficiently im- 
portant that we mention it in passing. 

Theorem 3.3.2 (Holder's Inequality). If u: R — > C and v: R — > C are Lebesgue 
measurable functions satisfying 

/oo 
\v(t) dt < oo 
-oo 

for some 1 < p, q < oo satisfying 1/p+l/q = 1, then the function t t—> u(t)v*(t) is 
integrable and 

1/9 



/>CO 


/ 


l'°° „ N 


i/p 


/ u(t)v*(t)dt 


< 


/ |«(*) | d< 




J — OO 


\ 


«/ — oo y 





/ /-oo \ 1/9 

f / |v(t)| 5 di) . (3.20) 



Note that the Cauchy-Schwarz Inequality corresponds to the case where p = q = 2. 

Proof. See, for example, (Rudin, 1974, Theorem 3.5) or (Royden, 1988, Section 
6.2). □ 



3.4 Applications 

There are numerous applications of the Cauchy-Schwarz Inequality. Here we only 
mention a few. The first relates the energy in the superposition of two signals to 
the energies of the individual signals. The result holds for both complex- valued and 
real-valued functions, and — as is our custom — we shall thus not make the range 
explicit. 

Proposition 3.4.1 (Triangle Inequality for £2). If u and v are in C%, then 

||u + v|| 2 <||u|| 2 + ||v|| 2 . (3.21) 

Proof. The proof is a straightforward application of the Cauchy-Schwarz Inequality 
and the basic properties of the inner product (3.6)-(3.9): 

||u + v||| = (u + v,u + v) 

= (u,u) + (v,v) + (u,v) + (v,u) 
<(u,u) + (v,v) + |(u,v)| + |(v,u)| 
= ||u|| 2 2 + ||v|| 2 2 +2|(u,v)| 
<||u|| 2 2 + ||v|| 2 2 +2||u|| 2 ||v|| 2 

= (HI 2 + ||v|| 2 ) 2 , 

from which the result follows by taking square roots. Here the first line follows 
from the definition of ||-|| 2 (3.11); the second by (3.9) & (3.10); the third by the 
Triangle Inequality for Complex Numbers (2.12); the fourth because, by (3.6), 
(v,u) is the complex conjugate of (u, v) and is hence of equal modulus; the fifth 
by the Cauchy-Schwarz Inequality; and the sixth by simple algebra. □ 
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Another important mathematical consequence of the Cauchy-Schwarz Inequality is 
the continuity of the inner product. To state the result we use the notation a n — > a 
to indicate that the sequence ai, a 2 , . . . converges to a, i.e., that linin^oo a n = a. 

Proposition 3.4.2 (Continuity of the Inner Product). Let u and v be in £g. // 

the sequence U!,u 2 , . . . of elements of C 2 satisfies 

||u n -u|| s -> 0, 
and if the sequence v 1: v 2 , . . . of elements of £g satisfies 

l|v„ - v|L -> 0, 



then 



Proof. 



(u„,v„) -» (u,v). 



|(u„,v„) - (u,v)| 

= I <u„ - u, v) + (u„ - u, v„ - v) + (u, v„ - v) | 

< | (u„ - u, v) | + | (u„ - u, v„ - v) | + | (u, v„ - v) | 

< ||u„ - u|| 2 ||v|| 2 + ||u„ - u|| 2 ||v„ - v\\ 2 + ||u|| a ||v„ - v|| g 
-0, 

where the first equality follows from the basic properties of the inner product (3.6)- 
(3.10); the subsequent inequality by the Triangle Inequality for Complex Numbers 
(2.12); the subsequent inequality from the Cauchy-Schwarz Inequality; and where 
the final limit follows from the proposition's hypotheses. □ 

Another useful consequence of the Cauchy-Schwarz Inequality is in demonstrating 
that if a signal is energy-limited and is zero outside an interval, then it is also 
integrable. 

Proposition 3.4.3 (Finite-Energy Functions over Finite Intervals are Integrable). 

If for some real numbers a and b satisfying a < b we have 

/ |x(£)| 2 d£<oo, 

J a 

then 



\x(0\ d$<Vb^d / \x(0\ d£, 



and, in particular, 



b 

|a;(£)|d£ < oo. 
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Proof. 

rb 



|x(f)| dt = I{o < £ < &} |a;(0| d£ 

«/ —oo 

/CO 
I{o < £ < &} I{o < £ < b} \x(0\ d£ 
OO^ v ' ■> „ ' 



«(£) f(«) 



6 

2 



< y/b-a\ / |cc(0| d£ 



where the inequality is just an application of the Cauchy-Schwarz Inequality to the 
function £ 1— > I{a < £ < b} \x(£)\ and the indicator function £ 1— > I{a < £ < 6}. D 

Note that, in general, an energy-limited signal need not be integrable. For example, 
the real signal 

(0 if t < 1, , 

t ^ { ~ ' (3.22) 

ll/t otherwise, 

is of finite energy but is not integrable. 

The Cauchy-Schwarz Inequality demonstrates that if both u and v are of finite 
energy, then their inner product (u, v) is well-defined, i.e., the integrand in (3.4) is 
integrable. It can also be used in slightly more sophisticated ways. For example, it 
can be used to treat cases where one of the functions, say u, is not of finite energy 
but where the second function decays to zero sufficiently quickly to compensate for 
that. For example: 

Proposition 3.4.4. If the Lebesgue measurable functions x: K — > C and y : R — > C 
satisfy 

t 2 + 1 
and 

fOO 

|y(t)| 2 (t 2 + l)di<oo, 



/ 



then the function t \— > x{i) y*(t) is integrable and 



— CO 



<\ll 7^rr df W/ W)\ 2 {t 2 + i)dt. 



Proof. This is a simple application of the Cauchy-Schwarz Inequality to the func- 
tions 1 1— ► x(t)/\/t 2 + 1 and 1 1— ► y(t)\/t 2 + 1. Simply write 



s(t)y*(t)dt= / -#L= v/* 2 + ly*(*)d« 




„«(t) 
and apply the Cauchy-Schwarz Inequality to the functions u(-) and v(-). □ 
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3.5 The Cauchy-Schwarz Inequality for Random Variables 

There is also a version of the Cauchy-Schwarz Inequality for random variables. It is 
very similar to Theorem 3.3.1 but with time integrals replaced by expectations. We 
denote the expectation of the random variable X by E [X] and remind the reader 
that the variance Var[X] of the random variable X is defined by 

Var[X] = E [{X - E[X}) 2 } . (3.23) 

Theorem 3.5.1 (Cauchy-Schwarz Inequality for Random Variables). Let the ran- 
dom variables U and V be of finite variance. Then 

| E [UV] | < v/E[t72] ^E[y2j, (3.24) 

with equality if, and only if, Pr[a£/ = f3V] = 1 for some real a and (3 that are not 
both equal to zero. 

Proof. Use the proof of Theorem 3.3.1 with all time integrals replaced with ex- 
pectations. For a different proof and for the conditions for equality see (Grimmett 
and Stirzaker, 2001, Chapter 3, Section 3.5, Theorem 9). □ 

For the next corollary we need to recall that the covariance Cov[C7, V] between the 
finite-variance random variables U, V is defined by 

Co\/[U, V] = E[(U - E[U}) (V - E[V})} . (3.25) 

Corollary 3.5.2 (Covariance Inequality). If the random variables U and V are of 
finite variance Var[[/] and Var[V], then 

\Com[U,V}\ < VVar[[/]vA/ar[U]. (3.26) 

Proof. Apply Theorem 3.5.1 to the random variables U — E[J7] and V — E[V]. □ 

Corollary 3.5.2 shows that the correlation coefficient, which is defined for ran- 
dom variables U and V having strictly positive variances as 

CovfC/, V] 

L (3.27) 



vATaTfcTjyvaTM' 

satisfies 

-l<p<+l. (3.28) 

3.6 Mathematical Comments 

(i) Mathematicians typically consider (u, v) only when both u and v are of finite 
energy. We are more forgiving and simply require that the integral defining 
the inner product be well-defined, i.e., that the integrand be integrable. 
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(ii) Some refer to ||u|L as the "norm of u" or the "£g norm of u." We shall 
refrain from this usage because mathematicians use the term "norm" very 
selectively. They require that no function other than the all-zero function be 
of zero norm, and this is not the case for ||-|L. Indeed, any function u that is 
indistinguishable from the all-zero function satisfies ||u|| 2 = 0, and there are 
many such functions (e.g., the function that is equal to one at rational times 
and that is equal to zero at all other times). This difficulty can be overcome 
by defining two functions to be the same if their difference is of zero energy. 
In this case ||-|| 2 is a norm in the mathematical sense and is, in fact, what 
mathematicians call the L^ norm. This issue is discussed in greater detail in 
Section 4.7. To stay out of trouble we shall refrain from giving ||-|| g a name. 



3.7 Exercises 

Exercise 3.1 (Manipulating Inner Products). Show that if u, v, and w are energy-limited 
complex signals, then 

(u + v, 3u + v + iw) =3 ||u||? + ||v||? + (u, v) +3(u, v}* — i(u,w) — i (v, w) . 



Exercise 3.2 (Orthogonality to All Signals). Let u be an energy-limited signal. Show 
that 

(uso)#((u,v)=0, ve£ 2 

Exercise 3.3 (Finite-Energy Signals). Let x be an energy-limited signal. 

(i) Show that, for every to £ R, the signal t i— > x(t — to) must also be energy-limited. 

(ii) Show that the reflection of x is also energy-limited. I.e., show that the signal x 
that maps t to x(— t) is energy-limited. 

(iii) How are the energies in t \— > x(t), t \— » x(t — t ), and 1 1— » x(— t) related? 

Exercise 3.4 (Inner Products of Mirror Images). Express the inner product (x, y) in 
terms of the inner product (x, y). 

Exercise 3.5 (On the Cauchy-Schwarz Inequality). Show that the bound obtained from 
the Cauchy-Schwarz Inequality is at least as tight as (3.16). 

Exercise 3.6 (Truncated Polynomials). Consider the signals u: t i— » (t + 2) I{0 < t < 1} 

and v: 1 1— > (i 2 — 2i — 3) I{0 < t < 1}. Compute the energies ||u|| a & ll v llg anc ^ ^ ne mner 
product (u, v). 

Exercise 3.7 (Indistinguishability and Inner Products). Let u G £2 be indistinguishable 
from u' £ £2, and let v £ £,g be indistinguishable from v' £ £2- Show that the inner 
product (u',v'} is equal to the inner product (u,v). 
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Exercise 3.8 (Finite Energy and Integrability). Let x: 



C be Lebesgue measurable. 



(i) Show that the conditions that x is of finite energy and that the mapping t >— > tx(t) 
is of finite energy are simultaneously met if, and only if, 



(3.29) 



\x(t)\ 2 {l + t 2 )dt < oo. 

(ii) Show that (3.29) implies that x is integrable. 
(iii) Give an example of an integrable signal that does not satisfy (3.29). 

Exercise 3.9 (The Cauchy-Schwarz Inequality for Sequences). 

(i) Let the complex sequences 01,02, • • - and bi, 62, ■ • ■ satisfy 



Em 2 >Eim 2 < 



Show that 



(ii) Derive the Cauchy-Schwarz Inequality for d-tuples: 



£*,k < £m 2 )(i> 



£*,« < Eki 2 Ei^ 



Exercise 3.10 (Summability and Square Summability). Let 01,02,--- be a sequence of 
complex numbers. Show that 



Eki <oo) ^ ( em 2 <°° 



Exercise 3.11 (A Friendlier GPA). Use the Cauchy-Schwarz Inequality for d-tuples (Prob- 
lem 3.9) to show that for any positive integer d, 



Oi + • • • + ad 



al + ■ • • + o3 



Ol, 



,ad € 



Chapter 4 

The Space C2 of Energy-Limited Signals 

4.1 Introduction 

In this chapter we shall study the space C2 of energy-limited signals in greater 
detail. We shall show that its elements can be viewed as vectors in a vector space 
and begin developing a geometric intuition for understanding its structure. We 
shall focus on the case of complex- valued signals, but with some minor changes the 
results are also applicable to real- valued signals. (The main changes that are needed 
for translating the results to real-valued signals are replacing C with R, ignoring 
the conjugation operation, and interpreting |-| as the absolute value function for 
real arguments as opposed to the modulus function.) 

We remind the reader that the space C2 was defined in Definition 3.1.1 as the set 
of all Lebesgue measurable complex- valued signals u : R — » C satisfying 

\u(t)\ 2 dt < 00, (4.1) 

and that in (3.12) we defined for every u € C2 the quantity ||u|| 2 as 



00 

2 



\u(t)\ dt. (4.2) 

We refer to C2 as the space of energy-limited signals and to its elements as energy- 
limited signals or signals of finite energy. 



4.2 £g as a Vector Space 

In this section we shall explain how to view the space C2 as a vector space over 
the complex field by thinking about signals in £2 as vectors, by interpreting the 
superposition u + v of two signals as vector-addition, and by interpreting the 
amplification of u by a as the operation of multiplying the vector u by the scalar 

aeC. 

We begin by reminding the reader that the superposition of the two signals u 
and v is denoted by u + v and is the signal that maps every t € R to u(t) + v(t). 

26 
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The amplification of u by a is denoted by au and is the signal that maps every 
t £ K to au(t). More generally, if u and v are signals and if a and (3 are complex 
numbers, then au + /3v is the signal t i— > au(t) + (3v{i). 

If u € C2 and a G C, then au is also in £2- Indeed, the measurability of u implies 
the measurability of au, and if u is of finite energy, then au is also of finite energy, 
because the energy in au is the product of \a\ 2 by the energy in u. We thus see 
that the operation of amplification of u by a results in an element of C2 whenever 
u € C2 and a £ C 

We next show that if the signals u and v are in C2, then their superposition 
u + v must also be in C%. This holds because a standard result in Measure Theory 
guarantees that the superposition of two Lebesgue measurable signals is a Lebesgue 
measurable signal and because Proposition 3.4.1 guarantees that if both u and v 
are of finite energy, then so is their superposition. Thus the superposition that 
maps u and v to u + v results in an element of C2 whenever u, v € £2- 

It can be readily verified that the following properties hold: 



(i) commutativity: 
(ii) associativity: 



u + v = v + u, u,vg£j 



(u + v) + w = u + (v + w), u,v,w £ £2, 
(a/3)u = a(/3u), fa,/3eC, ue£gj; 
(iii) additive identity: the all-zero signal 0: t 1— > satisfies 

+ u = u, u e £2', 

(iv) additive inverse: to every u € £2 there corresponds a signal w G £2 
(namely, the signal t t— > — u(t)) such that 

u + w = 0; 

(v) multiplicative identity: 

lu = u, u G £2; 
(vi) distributive properties: 

a(u + v) = au + av, (a € C, u,ve£gj, 

(a + /3)u = au + /3u, (a, /3sC, u££ 2 )- 

We conclude that with the operations of superposition and amplification the set £2 
forms a vector space over the complex field (Axler, 1997, Chapter 1). This justifies 
referring to the elements of £2 as "vectors," to the operation of signal superposition 
as "vector addition," and to the operation of amplification of an element of £2 by 
a complex scalar as "scalar multiplication." 
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4.3 Subspace, Dimension, and Basis 

Once we have noted that C2 together with the operations of superposition and 
amplification forms a vector space, we can borrow numerous definitions and results 
from the theory of vector spaces. Here we shall focus on the very basic ones. 

A linear subspace (or just subspace) of C2 is a nonempty subset U of C2 that 
is closed under superposition 

u! + u 2 e W, Ui,u 2 eW (4.3) 

and under amplification 

aueU, (aeC, ueU). (4.4) 

Example 4.3.1. Consider the set of all functions of the form 

t 1-> p(t) e -1 * 1 , 

where p(t) is any polynomial of degree no larger than 3. Thus, the set is the set of 
all functions of the form 

4 1 — »• (a + ait + a 2 t 2 + a 3 t 3 ) e _|t| , (4.5) 

where 010,0:1,0:2,0:3 are arbitrary complex numbers. 

In spite of the polynomial growth of the pre-exponent, all such functions are in C2 
because the exponential decay more than compensates for the polynomial growth. 
The above set is thus a subset of £g. Moreover, as we show next, this is a linear 
subspace of £g. 

If u is of the form (4.5), then so is ou, because ou is the mapping 

t 1— » (aao + aait + aa 2 t + aast ) e - '*', 

which is of the same form. 
Similarly, if u is as given in (4.5) and 

v : t h-» (fa + ^t + 2 t 2 + 3 t 3 ) e— 1*1 , 

then u + v is the mapping 

* .-> ((00 + 0o) + (01 + 0i)t + (02 + 02)t 2 + (03 + P 3 )t 3 ) e— 1*1, 

which is again of this form. 

An n-tuple of vectors from C s is a (possibly empty) ordered list of n vectors 
from C2 separated by commas and enclosed in parentheses, e.g., (vi, . . . , v„). Here 
n > can be any nonnegative integer, where the case n = corresponds to the 
empty list. 

A vector v G £2 is said to be a linear combination of the n-tuple (v 1: . . . , v„) if 
it is equal to 

OiVi H ho„v„, (4.6) 
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which is written more succinctly as 

n 

y^Qii/Vi/, (4.7) 

for some scalars Oi, . . . , Q B 6 C. The all-zero signal is a linear combination of any 
n-tuple including the empty tuple. 

The span of an n-tuple (vi, . . . , v„) of vectors in C2 is denoted by 

span(vi,. . . ,v n ) 

and is the set of all vectors in C2 that are linear combinations of (vi, . . . , v„): 

span(vi,...,v n ) = {aiVi -I h a„v„ : a 1: . . . , a n e C}. (4.8) 

(The span of the empty tuple is given by the one-element set {0} containing the 
all-zero signal only.) 

Note that for any n-tuple of vectors (vi, . . . , v„) in C2 we have that span(vi, . . . , v„) 
is a linear subspace of £2. Also, if IA is a linear subspace of C2 and if the vectors 
Ui, . . . ,U n are in IA, then span(u 1; . . . , u„) is a linear subspace which is contained 
in IA. A subspace IA of £ 2 is said to be finite-dimensional if there exists an 
n-tuple (ui, . . . , u„) of vectors in IA such that span(ui, . . . , u„) = U. Otherwise, 
we say that IA is infinite-dimensional. For example, the space of all mappings 
of the form t 1— > p(t) e~' 1 ' for some polynomial p(-) can be shown to be infinite- 
dimensional, but under the restriction that p(-) be of degree smaller than 5, it is 
finite-dimensional. If IA is a finite-dimensional subspace and if IA' is a subspace 
contained in IA, then IA' must also be finite-dimensional. 

An n-tuple of signals (vi, . . . , v n ) in C2 is said to be linearly independent if 
whenever the scalars ai, . . . , a n £ C are such that a\V\ + ■ ■ ■ a„v„ = 0, we have 
ai = ■ • • = a„ = 0. I.e., if 

n s 

^a u v v = OJ => (a v = 0, u = l,...,ri). (4.9) 

(By convention, the empty tuple is linearly independent.) For example, the 3- 
tuple consisting of the signals t 1— > e - '*', t f— » ie~'*', and t 1— > ^ 2 e~l*l is linearly 
independent. If (vi,...,v n ) is not linearly independent, then we say that it is 
linearly dependent. For example, the 3-tuple consisting of the signals 1 1— » e ' * ' , 
t 1— > ie - '*', and t i— > (2t + l) e 1*1 is linearly dependent. The n-tuple (vi, . . . , v„) 
is linearly dependent if, and only if, (at least) one of the signals in the tuple can 
be written as a linear combination of the others. 

The d-twple (ui, . . . , u<j) is said to form a basis for the linear subspace IA if it is 
linearly independent and if span(ui, . . . , u<j) = IA. The latter condition is equivalent 
to the requirement that every u£li can be represented as 

u = aiUi + ■ ■ ■ + adU,i (4-10) 

for some ai,...,a<j G C The former condition that the tuple (ui,...,u<j) be 
linearly independent guarantees that if such a representation exists, then it is 
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unique. Thus, (ui, . . . , u^) forms a basis for hi if Ui, . . . , u^ GM (thus guaranteeing 
that span(ui, . . . , u^) C hi) and if every ueW can be written uniquely as in (4.10). 

Every finite-dimensional linear subspace hi has a basis, and all bases for hi have the 
same number of elements. This number is called the dimension of hi. Thus, if hi 
is a finite-dimensional subspace and if both (ui, . . . , U<j) and (u' 1; . . . , u^,) form a 
basis for hi, then d = d! and both are equal to the dimension of hi. The dimension 
of the subspace {0} is zero. 

4.4 ||u|| 2 as the "length" of the Signal «(•) 

Having presented the elements of C2 as vectors, we next propose to view ||u|| 2 as 
the "length" of the vector u G £g. To motivate this view, we first present the key 
properties of ||-|| 2 . 

Proposition 4.4.1 (Properties of ||-|| 2 ). Let u and v be elements of C%, and let a 

be some complex number. Then 

ll«u||* = N HI*. (4- 11 ) 

||u + v|| 2 <||u|| 2 + ||v|| 2 , (4.12) 

and 

0) & (u = 0). (4.13) 



(hi 



Proof. Identity (4.11) follows directly from the definition of ||-|| 2 ; see (4.2). In- 
equality (4.12) is a restatement of Proposition 3.4.1. The equivalence of the con- 
dition || u || 2 = and the condition that u is indistinguishable from the all-zero 
signal follows from Proposition 2.5.3. □ 

Identity (4.11) is in agreement with our intuition that stretching a vector merely 
scales its length. Inequality (4.12) is sometimes called the Triangle Inequality 
because it is reminiscent of the theorem from planar geometry that states that the 
length of no side of a triangle can exceed the sum of the lengths of the others; see 
Figure 4.1. 

Substituting — y for u and x + y for v in (4.12) yields ||x|| 2 < ||y|L + ||x + y|| 2 , 
i.e., the inequality ||x + y|L > ||x|| 2 — ||y|| 2 . And substituting — x for u and x + y 
for v in (4.12) yields the inequality ||y|| 2 < ||x|| 2 + ||x + y|| s , i.e., the inequality 
Il x + y||;g — llylU - IWU- Combining the two inequalities we obtain the inequality 
Il x + y||;g > llWU — 1 1 y II 2 • This inequality can be combined with the inequality 
|| x + y||g 5= IWU + ||y||g i n f ne compact form of a double-sided inequality 

|||x|| 2 -||y|| 2 |<||x + y|| 2 <||x|| 2 + ||y|| 2 , x,y6£ 2 . (4.14) 

Finally, (4.13) "almost" supports the intuition that the only vector of length zero 
is the zero-vector. In our case, alas, we can only claim that if a vector is of zero 
length, then it is indistinguishable from the all-zero signal, i.e., that all t's outside 
a set of Lebesgue measure zero are mapped by the signal to zero. 



4.4 ||u|| a as the "length" of the Signal u(- 
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Figure 4.1: A geometric interpretation of the Triangle Inequality for energy-limited 

signals: ||u + v|L < ||u|L + ||v|L. 




Figure 4.2: Illustration of the shortest path property in £g. The shortest path 
from A to B is no longer than the sum of the shortest path from A to C and the 
shortest path from C to B. 



The Triangle Inequality (4.12) can also be stated slightly differently. In planar 
geometry the sum of the lengths of two sides of a triangle can never be smaller 
than the length of the remaining side. Thus, the shortest path from Point A to 
Point B cannot exceed the sum of the lengths of the shortest paths from Point A to 
Point C, and from Point C to Point B. By applying Inequality (4.12) to the signal 
u — w and w — v we obtain 



|u-v|L < ||u-w|L + ||w-v|L, u,v,we A 



i.e., that the distance from u to v cannot exceed the sum of distances from u to w 
and from w to v. See Figure 4.2. 
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4.5 Orthogonality and Inner Products 

To further develop our geometric view of £2 we next discuss orthogonality. We 
shall motivate its definition with an attempt to generalize Pythagoras 's Theorem 
to C2 ■ As an initial attempt at defining orthogonality we might define two func- 
tions u,v G £.2 t° be orthogonal if ||u + v|| 2 = ||u|| g + ||v|| 2 . Recalling the 
definition of ||-|| s (4.2) we obtain that this condition is equivalent to the condition 
Re(/ u(t) v*(t) dt) = 0, because 

/>oo 

||u + v|| 2 =/ \u(t) + v(t)\ 2 dt 

(u(t) + v(t))(u(t) + v(t))*dt 

/ \u{t)\ 2 + \v{t)\ 2 + 2Re(u{t)v*{t)))dt 

= l|u||| + ||v|| 2 2 + 2Re('y u(t)v*(t)dt\ u,ve£ 2 , (4.15) 

where we have used the fact that integration commutes with the operation of taking 
the real part; see Proposition 2.3.1. 

While this approach would work well for real- valued functions, it has some embar- 
rassing consequences when it comes to complex-valued functions. It allows for the 
possibility that u is orthogonal to v, but that its scaled version cm is not. For exam- 
ple, with this definition, the function t 1— > il{|i| < 5} is orthogonal to the function 
t i-> I{|t| < 17} but its scaled (by a = i) version t i-> i i I{|i| < 5} = - I{|i| < 5} is 
not. To avoid this embarrassment, we define u to be orthogonal to v if 

Hau + vll 2 = ||au||| + ||v|| 2 , qgC. 

This, by (4.15), is equivalent to 

Re(a / u(t)v*(t)dt) = 0, a e C, 

i.e., to the condition 



u(t)v*{t)dt = (4.16) 

(because if z € C is such that Re(az) = for all a G C, then 2 = 0). Recalling the 
definition of the inner product (u, v) from (3.4) 



(u,v) = / u(t)v*(t)dt, (4.17) 

J — 00 

we conclude that (4.16) is equivalent to the condition (u,v) = or, equivalently 
(because by (3.6) (u, v) = (v,u) ) to the condition (v, u) = 0. 

Definition 4.5.1 (Orthogonal Signals in £2)- The signals u,v e £2 are said to 
be orthogonal if 

(u,v)=0. (4.18) 
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The n-tuple (ui, . . . , u„) is said to be orthogonal if any two signals in the tuple are 
orthogonal 

(ut,nf) = 0, (i^t, £,£' £{!,..., n}). (4.19) 

The reader is encouraged to verify that if u is orthogonal to v then so is an. Also, 
u is orthogonal to v if, and only if, v is orthogonal to u. Finally every function is 
orthogonal to the all-zero function 0. 

Having judiciously defined orthogonality in £g, we can now extend Pythagoras's 
Theorem. 

Theorem 4.5.2 (A Pythagorean Theorem). // the n-tuple of vectors (ui, . . . , u„) 

in C2 is orthogonal, then 

||ui + • • • + u„||| = ||ui||g + • • • + ||u n ||g . 

Proof. This theorem can be proved by induction on n. The case n = 2 follows 
from (4.15) using Definition 4.5.1 and (4.17). 

Assume now that the theorem holds for n = v, for some v > 2, i.e., 

||ui + ...+u v \\l = ||ui||g + • • • + ||u„||g , 
and let us show that this implies that it also holds for n = v + 1, i.e., that 

||ui + • • • + u^+iHl = ||ui||g + • • • + ||u„+i||g . 

To that end, let 

v = ui + --- + u 1/ . (4.20) 

Since the i^-tuple (u 1: . .. ,VL V ) is orthogonal, our induction hypothesis guarantees 
that 

l|v||! = ||u 1 || 2 2 + ...+ K|| 2 g. (4.21) 

Now v is orthogonal to u^+i because 

(v, u„ + i) = (ui H hu„, u, y+1 ) 

= (ui, u„ + i) H h (u v , u, y+ i) 

= 0, 



so by the n = 2 case 



v + u„ + i||^ = ||v||^ + ||u„ + i||^. (4.22) 



\2 ~ II v 112 

Combining (4.20), (4.21), and (4.22) we obtain 

||ui + h u^+iHg = ||v + u^+iHg 



|v||g + Hu^+iHg 

|ui|| 2 2 + --- + ||u„ +1 || 2 . □ 
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Figure 4.3: The projection w of the vector v onto u. 



To derive a geometric interpretation for the inner product (u, v) we next extend 
to C2 the notion of the projection of a vector onto another. We first recall the 
definition for vectors in K 2 . Consider two nonzero vectors u and v in the real 
plane R 2 . The projection w of the vector v onto u is a scaled version of u. More 
specifically, it is a scaled version of u and its length is equal to the product of the 
length of v multiplied by the cosine of the angle between v and u (see Figure 4.3). 
More explicitly, 



w = (length of v) cos(angle between v and u) 



length of u 



(4.23) 



This definition does not seem to have a natural extension to C2 because we have not 
defined the angle between two signals. An alternative definition of the projection, 
and one that is more amenable to extensions to C2, is the following. The vector w 
is the projection of the vector v onto u, if w is a scaled version of u, and if v — w 
is orthogonal to u. 

This definition makes perfect sense in C2 too, because we have already defined 
what we mean by "scaled version" (i.e., "amplification" or "scalar multiplication" ) 
and "orthogonality." We thus have: 

Definition 4.5.3 (Projection of a Signal in C2 onto another). Let u e C2 have 

positive energy. The projection of the signal v € C2 onto the signal u € C2 

is the signal w that satisfies both of the following conditions: 

1) w = au for some qgC and 

2) v — w is orthogonal to u. 



Note that since £g is closed with respect to scalar multiplication, Condition 1) 
guarantees that the projection w is in £3. 

Prima facie it is not clear that a projection always exists and that it is unique. 
Nevertheless, this is the case. We prove this by finding an explicit expression 
for w. We need to find some a € C so that au will satisfy the requirements of 
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the projection. The scalar a is chosen so as to guarantee that v — w is orthogonal 
to u. That is, we seek to solve for a G C satisfying 

(v — au, u) = 0, 



(v,u) -a ||u||;| = 0. 

Recalling our hypothesis that ||u|| 2 > (strictly), we conclude that a is uniquely 

given by 

<v,u> 
a = 5-, 

HI 2 , 

and the projection w is thus unique and is given by 



Ml! 



(4.24) 



Comparing (4.23) and (4.24) we can interpret 

(v,u) 



(4.25) 



as the cosine of the angle between the function v and the function u (provided 
that neither u nor v is zero). If the inner product is zero, then we have said that 
v and u are orthogonal, which is consistent with the cosine of the angle between 
them being zero. Note, however, that this interpretation should be taken with a 
grain of salt because in the complex case the inner product in (4.25) is typically a 
complex number. 

The interpretation of (4.25) as the cosine of the angle between v and u is further 
supported by noting that the magnitude of (4.25) is always in the range [0, 1]. This 
follows directly from the Cauchy-Schwarz Inequality (Theorem 3.3.1) to which we 
next give another (geometric) proof. Let w be the projection of v onto u. Then 
starting from (4.24) 



l(v,u)| 5 



uL 



<||w||*+||v-w||* 

= l|w + (v-w)|| 2 

= l|vf 2 , (4.26) 

where the first equality follows from (4.24); the subsequent inequality from the 
nonnegativity of ||-|| 2 ; and the subsequent equality by the Pythagorean Theorem 
because, by its definition, the projection w of v onto u must satisfy that v — w is 
orthogonal to u and hence also to w, which is a scaled version of u. The Cauchy- 
Schwarz Inequality now follows by taking the square root of both sides of (4.26). 
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4.6 Orthonormal Bases 

We next consider orthonormal bases for finite-dimensional linear subspaces. These 
are special bases that are particularly useful for the calculation of projections and 
inner products. 

4.6.1 Definition 

Definition 4.6.1 (Orthonormal Tuple). An n-tuple of signals in C2 is said to be 
orthonormal if it is orthogonal and if each of the signals in the tuple is of unit 
energy. 

Thus, the n-tuple (4>\, . . . , <f> n ) of signals in £g is orthonormal, if 

(0 if £ =£ £' 
x {i y =e e,f€{l,...,n}- (4-27) 

Linearly independent tuples need not be orthonormal, but orthonormal tuples must 
be linearly independent: 

Proposition 4.6.2 (Orthonormal Tuples Are Linearly Independent). If a tuple of 
signals in C2 is orthonormal, then it must be linearly independent. 

Proof. Let the n-tuple (<f>i, . . . , (fr n ) of signals in £g be orthonormal, i.e., satisfy 
(4.27). We need to show that if 

n 

^at4>i = 0, (4.28) 

t=\ 

then all the coefficients a\, . . . , a n must be zero. To that end, assume (4.28). It 
then follows that for every (! 6 {1, . . . , n} 



o=<o,<M 




\ e=i 


,4>t 


= ^ae(<pe, 


<t>e>) 


t=\ 




= Y J <* f \{t-- 


= £'} 


t=\ 




= "£', 





thus demonstrating that (4.28) implies that at = for every (! G {1, . . . , n}. Here 
the first equality follows because is orthogonal to every energy-limited signal 
and, a fortiori, to <pt] the second by (4.28); the third by the linearity of the inner 
product in its left argument (3.7) & (3.9); and the fourth by (4.27). □ 
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Definition 4.6.3 (Orthonormal Basis). A d-tuple of signals in C2 is said to form 
an orthonormal basis for the linear subspace U C C2 if it is orthonormal and 
its span is U . 

4.6.2 Representing a Signal Using an Orthonormal Basis 

Suppose that (0i, ■ ■ ■ ,4>d) is an orthonormal basis for VI C £2. The fact that 
(01, . . . , 4>d) spans U guarantees that every ueW can be written as u = J^ ae4>i 
for some coefficients ai, . . . , ay G C The fact that (0i, ■ ■ ■ , <pd) is orthonormal 
implies, by Proposition 4.6.2, that it is also linearly independent and hence that 
the coefficients {ag} are unique. How does one go about finding these coefficients? 
We next show that the orthonormality of (0i, . . . , 4>d) also implies a very simple 
expression for otg above. Indeed, as the next proposition demonstrates, ae is given 
explicitly as (u, 4>g). 

Proposition 4.6.4 (Representing a Signal Using an Orthonormal Basis). 

(i) If (0i, . . . , 4>d) is an orthonormal tuple of functions in C 2 an d */ u € ^2 
can be written as u = 5^£=i a f.4>e. f or some complex numbers on,. . . , ctd, then 
a£ = (u, <f>i) for every £ € {1, . . . , d}: 



u = ^2 a i4>i ) => {ae= (u, 4>i) , £ G {1, . . . , d} 

((01, . • . , 4>d) orthonormal). (4.29) 
(ii) If (0i, . . . , 4>d) is an orthonormal basis for the subspace Li C C2, then 



d 



u = ^(u,0 £ >0^, ueU. (4.30) 



d 



Proof. We begin by proving Part (i). If u = 5^£=i a i4>t, then for every £' € 
{l,...,d} 

I d 
(u, <j> t ,) = I ^at<t>e,4>e> 

\ e=i 

a 

= y %2<xt{<l>t,<t>i') 

1=1 
d 
= Y,a e l{£ = £ / } 
1=1 

= at, 
thus proving Part (i). 
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We next prove Part (ii). Let ugWbe arbitrary. Since, by assumption, the tuple 
(01, . . . , (fid) forms an orthonormal basis for IA it follows a fortiori that its span 
is IA and, consequently, that there exist coefficients a\, . . . , a<j € C such that 



S.ai&e 



(4.31) 



It now follows from Part (i) that for each £ G {1, . . . , d} the coefficient otg in (4.31) 
must be equal to (u, (fit), thus establishing (4.30). □ 

This proposition shows that if (4>i, ■ ■ ■ , (fid) is an orthonormal basis for the sub- 
space IA and if u € IA, then u is fully determined by the complex constants (u, (fii), 
. . . , (u, (fid)- Thus, any calculation involving u can be computed from these con- 
stants by first reconstructing u using the proposition. As we shall see in Proposi- 
tion 4.6.9, calculations involving inner products and norms are, however, simpler 
than that. 



4.6.3 Projection 

We next discuss the projection of a signal v G £g onto a finite-dimensional linear 
subspace IA that has an orthonormal basis {(fi\, . . . , (fid)- 1 To define the projection 
we shall extend the approach we adopted in Section 4.5 for the projection of the 
vector v onto the vector u. Recall that in that section we defined the projection 
as the vector w that is a scaled version of u and that satisfies that (v — w) is 
orthogonal to u. Of course, if (v — w) is orthogonal to u, then it is orthogonal to 
any scaled version of u, i.e., it is orthogonal to every signal in the space span(u). 

We would like to adopt this approach and to define the projection of v G £g onto IA 
as the element w of IA for which (v — w) is orthogonal to every signal in IA. Before 
we can adopt this definition, we must show that such an element of IA always exists 
and that it is unique. 

Lemma 4.6.5. Let (<fii, . . . ,<fid) be an orthonormal basis for the linear subspace 
IA C C2 ■ Let v € C2 be arbitrary. 

(i) The signal v — X^=i ( v j Qt) <Pt * s orthogonal to every signal in IA: 



v-^(v,0 £ )0 £ ,u\ =0, (veC 2 , lieu). (4.32) 

v 1=1 ' 

(ii) Lf w € IA is such that v — w is orthogonal to every signal in IA, then 

a 
w = ^(v,0 £ )<^. (4.33) 



x As we shall see in Section 4.6.5, not every finite-dimensional linear subspace of C2 has an 
orthonormal basis. Here we shall only discuss projections onto subspaces that do. 
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Proof. To prove (4.32) we first verify that it holds when u = <p£i, for some £' in 
the set {1, . . . ,d}: 



v - E (v, 4>i) <p e , 4>i> ) = (v, <t>v) - ( E ( v > ^) ^' ^ 

<i 
(v,4>i>) - X {v,<f>t) {(t>t,(t>t 



1=1 ' x 1=1 

d 



1=1 
d 

1=1 
= (v,<M - (v,<f>e>) 
= 0, £' £{!,..., d}. (4.34) 

Having verified (4.32) for u = <^/ we next verify that this implies that it holds 
for all u £ U. By Proposition 4.6.4 we obtain that any u G U can be written as 
u = YLv=\fil'4>l'-: where f3 e , = (u,<f>e>). Consequently, 

d 

, t 'i'<Pr 

1=1 



V ~X (v,d>e)<f>e,u) = /v- ^2(v,<f)e)4> e , ^ (3. 

\ i=i i 

d,d 



£' = 1 * £=1 

(I 

= 0, ueU, 

where the third equality follows from (4.34) and the basic properties of the inner 
product (3.6)-(3.10). 

We next prove Part (ii) by showing that if w, w' GM satisfy 

(v-w,u) = 0, ueU (4.35) 

and 

(v-w',u) = 0, ueU, (4.36) 

then w = w'. 

This follows from the calculation: 

d d 

w - w' = y^ (w, 4>i) 4>i - ^2 (w', <pe) d>i 



(=1 




1=1 




d 








»- 


-w' 


, 4>t) 4>e 




i=i 








d 








>;((v- 


- w 


')"(v- 


w),^) 



1=1 
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d 



1=1 
d 

= E(°-°)^ 

i=\ 

= 0, 

where the first equality follows from Proposition 4.6.4; the second by the linearity of 
the inner product in its left argument (3.9); the third by adding and subtracting v; 
the fourth by the linearity of the inner product in its left argument (3.9); and the 
fifth equality from (4.35) & (4.36) applied by substituting 4>£ for u. □ 

With the aid of the above lemma we can now define the projection of a signal onto 
a finite-dimensional subspace that has an orthonormal basis. 2 

Definition 4.6.6 (Projection of v s £g onto hi). Let hi C C2 be a finite- 
dimensional linear subspace of C2 having an orthonormal basis. Let v € C2 be an 
arbitrary energy-limited signal. Then the projection of v onto hi is the unique 
element w of hi such that 

(v-w,u) = 0, uehl. (4.37) 

Note 4.6.7. By Lemma 4.6.5 it follows that if (<f>i, ■ ■ ■ , 4>d) is an orthonormal basis 
for hi, then the projection of v € C2 onto hi is given by 



]T(v,<^)0 £ . (4.38) 



To further develop the geometric picture of £2, we next show that, loosely speaking, 
the projection of v G C2 onto hi is the element in hi that is closest to v. This result 
can also be viewed as an optimal approximation result: if we wish to approximate v 
by an element of hi, then the optimal approximation is the projection of v onto hi, 
provided that we measure the quality of our approximation using the energy in the 
error signal. 

Proposition 4.6.8 (Projection as Best Approximation). Let hi C C2 be a finite- 
dimensional subspace of C2 having an orthonormal basis (<pi, . . . , 4>d)- Let v € C2 
be arbitrary. Then the projection of v onto hi is the element w € hi that, among 
all the elements of hi, is closest to v in the sense that 



\2 ' 



u G U. (4.39) 



Proof. Let w be the projection of v onto U and let u be an arbitrary signal in U. 
Since, by the definition of projection, w is in hi and since hi is a linear subspace, 
it follows that w — u G hi. Consequently, since by the definition of the projection 



2 A projection can also be defined if the subspace does not have an orthonormal basis, but in 
this case there is a uniqueness issue. There may be numerous vectors w£U such that v — w is 
orthogonal to all vectors in IA. Fortunately, they are all indistinguishable. 



4.6 Orthonormal Bases 41 

v — w is orthogonal to every element of hi, it follows that v — w is a fortiori 
orthogonal to w — u. Thus 



u|| 2 2 = ||(v- 


- w) + (w — 


-u)|| 2 , 


= l|v- 


w||g + ||w- 


" u lll 


>l|v- 


w iiL 





(4.40) 
(4.41) 

where the first equality follows by subtracting and adding w, the second equality 
from the orthogonality of (v — w) and (w — u), and the final equality by the 
nonnegativity of ||-|| 2 - It follows from (4.41) that no signal in hi is closer to v 
than w is. And it follows from (4.40) that if u € hi is as close to v as w is, 
then u — w must be an element of hi that is of zero energy. We shall see in 
Proposition 4.6.10 that the hypothesis that hi has an orthonormal basis implies 
that the only zero-energy element of hi is 0. Thus u and w must be identical, and 
no other element of hi is as close to v as w is. □ 



4.6.4 Energy, Inner Products, and Orthonormal Bases 

As demonstrated by Proposition 4.6.4, if ((pi, . . . , (pd) forms an orthonormal basis 
for the subspace hi C C 2 , then any signal uGW can be reconstructed from the d 
numbers (u, (pi) , . . . , (u, (pd)- Any quantity that can be computed from u can thus 
be computed from (u, <pi) , . . . , (u, (pd) by first reconstructing u and by then per- 
forming the calculation on u. But some calculations involving u can be performed 
based on (u, (pi) , . . . , (u, (pd) much more easily. 

Proposition 4.6.9. Let ((pi, . . . , (pd) be an orthonormal basis for the linear subspace 
UdC 2 - 

(i) The energy ||u|| s of every u G Li can be expressed in terms of the d inner 
products (u, (pi) , . . . , (u, (pd) as 



||u|| 2 2 = £|<u,<M| 2 . (4.42) 

(ii) More generally, if v G L2 (not necessarily in U), then 

d 
||v|| 2 2 >^|(v,^)| 2 (4.43) 



with equality if, and only if, v is indistinguishable from some signal in U. 



(Hi) The inner product between any v s £g and any u s U can be expressed in 
terms of the inner products {(v, (pi)} and {(u, (pi)} as 

a 
(v,u) = ^(v,0,)(u,^)*. (4.44) 
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Proof. Part (i) follows directly from the Pythagorean Theorem (Theorem 4.5.2) 
applied to the d-tuple ((u, <pi) <pi, . . . , (u, <pd) <Pd) ■ 

To prove Part (ii) we expand the energy in v as 



(v - ^2 ( v ' fa) fa) + Yl ^ v ' fa) fa 

i=i i=i 

d „d 

V - ^ (V, <fo) 0£ + 5Z ^ V ' ^) ^ 

£=1 1=1 

d yd 

v-^(v,0/)^ + ^|(v,<^)| 



f=l 



> 



X)K v '^ 



(4.45) 



where the first equality follows by subtracting and adding the projection of v 
onto U; the second from the Pythagorean Theorem and by Lemma 4.6.5, which 
guarantees that the difference between v and its projection is orthogonal to any 
signal in hi and hence a fortiori also to the projection itself; the third by Part (i) 
applied to the projection of v onto U; and the final inequality by the nonnegativity 
of energy. 

If Inequality (4.45) holds with equality, then the last inequality in its derivation 



must hold with equality, so 



v - Efci ( v > fa) <t>i 

indistinguishable from the signal 5^£=i ( v i fa) fai which is in Li 
Conversely, if v is indistinguishable from some u' G hi, then 



and hence v must be 



|v- 
lu' 



Ei(u',^ 



2J(v,^) + (u' 


- v, (j) 


1=1 




d 




El< v ><MI 2 ' 





where the first equality follows by subtracting and adding u'; the second follows 
from the Pythagorean Theorem because the fact that ||v — u'||g = implies that 
(v — u', u') = (as can be readily verified using the Cauchy-Schwarz Inequality 
|(v — u',u')| < ||v — u'Hg ||u'||g); the third from our assumption that v and u' are 
indistinguishable; the fourth from Part (i) applied to the function u' (which is in Li)\ 
the fifth by adding and subtracting v; and where the final equality follows because 
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(u' — v, 4>i) = (as can be readily verified from the Cauchy Schwarz Inequality 

|<u'-v,^)|<||u'-v|| 2 ||0 £ || 2 ). 

To prove Part (iii) we compute (v, u) as 



(v,u) = /v-^(v,<^)0 £ + ^ (v,fa)fa,U 

, d . , d 

v - J^ <v, fa) fa,u) + (^ ( v : fa) fa> u 
i=i 
d 

^2{v,fa)fa,u 

i 

= ^2(^,fa) (<^ u ) 

£=1 
d 

= ^2(v,4>i) (a,fa)* , 
e=i 

where the first equality follows by subtracting and adding X^=i ( v j fa) 4>t, the 
second by the linearity of the inner product in its left argument (3.9); the third 
because, by Lemma 4.6.5, the signal v — X^=i ( v ; fa) fa i s orthogonal to any signal 
in Li and a fortiori to u; the fourth by the linearity of the inner product in its left 
argument (3.7) & (3.9); and the final equality by (3.6). □ 

Proposition 4.6.9 has interesting consequences. It shows that if one thinks of (u, (f>i) 
as the £-th coordinate of u (with respect to the orthonormal basis (fa, . . . , fat)), 
then the energy in u is simply the sum of the squares of the coordinates, and the 
inner product between two functions is the sum of the products of each coordinate 
of u and the conjugate of the corresponding coordinate of v. 

We hope that the properties of orthonormal bases that we presented above have 
convinced the reader by now that there are certain advantages to describing func- 
tions using an orthonormal basis. A crucial question arises as to whether orthonor- 
mal bases always exist. This question is addressed next. 

4.6.5 Does an Orthonormal Basis Exist? 

Word on the street has it that every finite-dimensional subspace of £g has an 
orthonormal basis, but this is not true. (It is true for the space L 2 that we shall 
encounter later.) For example, the set 

{u € C 2 '■ u(t) = whenever t 7^ 17} 

of all energy- limited signals that map t to zero whenever t ^ 17 (with the value 
to which t = 17 is mapped being unspecified) is a one dimensional subspace of C2 
that does not have an orthonormal basis. (All the signals in this subspace are of 
zero energy, so there are no unit-energy signals in it.) 
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Proposition 4.6.10. // hi is a finite- dimensional subspace of £2, then the following 
two statements are equivalent: 

(a) hi has an orthonormal basis. 

(b) The only element of Li of zero energy is the all-zero signal 0. 

Proof. The proof has two parts. The first consists of showing that (a) => (b), i.e., 
that if hi has an orthonormal basis and if u G hi is of zero energy, then u must 
be the all-zero signal 0. The second part consists of showing that (b) => (a), i.e., 
that if the only element of zero energy in hi is the all-zero signal 0, then hi has an 
orthonormal basis. 

We begin with the first part, namely, (a) => (b). We thus assume that ((pi, . . . , 4>d) 
is an orthonormal basis for hi and that u G hi satisfies ||u|| 2 = and proceed 
to prove that u = 0. We simply note that, by the Cauchy-Schwarz Inequality, 
I (u, 0^)| < ||u|| 2 ||<^|| 2 so the condition ||u|| 2 = implies 

(u,«fo) = 0, e&{l,...,d}, (4.46) 

and hence, by Proposition 4.6.4, that u = 0. 

To show (b) => (a) we need to show that if no signal in hi other than has zero 
energy, then hi has an orthonormal basis. The proof is based on the Gram-Schmidt 
Procedure, which is presented next. As we shall prove, if the input to this procedure 
is a basis for hi and if no element of hi other than is of energy zero, then the 
procedure produces an orthonormal basis for hi. The procedure is actually even 
more powerful. If it is fed a basis for a subspace that does contain an element other 
than of zero-energy, then the procedure produces such an element and halts. 

It should be emphasized that the Gram-Schmidt Procedure is not only useful for 
proving theorems; it can be quite useful for finding orthonormal bases for practical 
problems. 3 □ 

4.6.6 The Gram-Schmidt Procedure 

The Gram-Schmidt Procedure is named after the mathematicians J0rgen Pedersen 
Gram (1850-1916) and Erhard Schmidt (1876-1959). However, as pointed out in 
(Farebrother, 1988), this procedure was apparently already presented by Pierre- 
Simon Laplace (1749-1827) and was used by Augustin Louis Cauchy (1789-1857). 

The input to the Gram-Schmidt Procedure is a basis (ui, . . . , u^) for a (i-dimensional 
subspace hi C £g. We assume that d > 1. (The only O-dimensional subspace of C2 
is the subspace {0} containing the all-zero signal only, and for this subspace the 
empty tuple is an orthonormal basis; there is not much else to say here.) If hi 
does not contain a signal of zero energy other than the all-zero signal 0, then the 
procedure runs in d steps and produces an orthonormal basis for hi (and thus also 
proves that hi does not contain a zero-energy signal other than 0). Otherwise, the 



3 Numerically, however, it is unstable; see (Golub and van Loan, 1996). 
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procedure stops after d or fewer steps and produces an element of U of zero energy 
other than 0. 

The Gram-Schmidt Procedure: 

Step 1: If || U! || 2 = 0, then the procedure declares that there exists a 
zero-energy element of U other than 0, it produces ui as proof, and it 
halts. Otherwise, it defines 

01 



l u ill 2 

and halts with the output ((pi) (if d = 1) or proceeds to Step 2 (if 
d>l). 

Assuming that the procedure has run for v — 1 steps without halting 
and has defined the vectors <f>\, . . . , 4> v -i, we next describe Step v. 

Step v. Consider the signal 

v-l 

u„ = u„ - ^ ( u »> <M 4>i- (4.47) 

i=i 

If || u„ || g = 0, then the procedure declares that there exists a zero- 
energy element of Li other than 0, it produces u„ as proof, and it halts. 
Otherwise, the procedure defines 

and halts with the output (<f>i , . . . , d>d) (if v is equal to d) or proceeds 
to Step v + 1 (if v < d). 

We next prove that the procedure behaves as we claim. 

Proof. To prove that the procedure behaves as we claim, we shall assume that the 
procedure performs Step v (i.e., that it has not halted in the steps preceding v) 
and prove the following: if at Step v the procedure declares that U contains a 
nonzero signal of zero-energy and produces u„ as proof, then this is indeed the 
case; otherwise, if it defines cj) v as in (4.48), then (<f>\, . . . , 4> v ) is an orthonormal 
basis for span(u 1? . . . , u^). 

We prove this by induction on v. For v = 1 this can be verified as follows. If 
||ui|| 2 = 0, then we need to show that Ui e U and that it is not equal to 0. This 
follows from the assumption that the procedure's input (ui, . . . , u^) forms a basis 
for U, so a fortiori the signals Ui, . . . , u^ must all be elements of U and neither 
of them can be the all-zero signal. If ||ui||g > 0, then (fix is a unit-energy scaled 
version of ui and thus (</>i) is an orthonormal basis for span(ui). 

We now assume that our claim is true for v — 1 and proceed to prove that it is also 
true for v. We thus assume that Step v is executed and that (<f>i, . . . , <j> v -i) is an 
orthonormal basis for span(ui, . . . , u„_i): 

0i,...,0„_ieW; (4.49) 
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span(0i,... ,4> v -\) = span(ui,... ,u„_i); (4.50) 

and 

(4> e ,4> t >) =!{£ = £'}, e,fe{l,...,v-l}. (4.51) 

We need to prove that if u. v is of zero energy, then it is a nonzero element of U of 
zero energy, and that otherwise the zMuple ((pi, . . . ,(p v ) is an orthonormal basis 
for span(ui, . . . , u„). To that end we first prove that 

u v eU (4.52) 

and that 

u„ ^ 0. (4.53) 

We begin with a proof of (4.52). Since (4.47) expresses u„ as a linear combination 
of ((pi, • . • , (pv-i, Uy), and since IA is by assumption a linear subspace, it suffices to 
show that 4>i, . . . , (/>„_i € U and that u„ £ U. The former follows from (4.49) and 
the latter from our assumption that (ui, . . . , u^) forms a basis for U. 

We next prove (4.53). By (4.47) it suffices to show that u„ ^ span(0i, . . . , </>„-i). 
By (4.50) this is equivalent to showing that u^ ^ span(ui, . . . , u„_i), which fol- 
lows from our assumption that (ui, . . . , u^) is a basis for VI and a fortiori linearly 
independent. 

Having established (4.52) and (4.53) it follows that if ||iii,||g = 0, then u„ is a 
nonzero element of U which is of zero-energy as we had claimed. 

To conclude the proof we now assume ||Ui/|| s > and prove that ((pi, ■ ■ ■ , (pv) is 
an orthonormal basis for span(u 1; . . . , u„). That ((pi, . . . , <p v ) is orthonormal fol- 
lows because (4.51) guarantees that ((pi, ■ ■ ■ , 4> v -i) is orthonormal; because (4.48) 
guarantees that cp u is of unit energy; and because Lemma 4.6.5 (applied to the lin- 
ear subspace span(0i, . . . , 0„_i)) guarantees that u„ — and hence also its scaled 
version <p v — is orthogonal to every element of span(0i, . . . , <p v —i) and in par- 
ticular to (pi, . . . ,<p v _i. It thus only remains to show that span(</>!, . . . , <p v ) = 
span(ui, . . . , u„). We first show that span(0i, ..., (p v ) C span(ui,...,u„). This 
follows because (4.50) implies that 

(pi, ... , cp v -i e span(ui, . . . , u„_i); (4.54) 

because (4.54), (4.47) and (4.48) imply that 

(p v e span(ui,. .. ,u„); (4.55) 

and because (4.54) and (4.55) imply that (pi, . . . , <p v € span(u 1; . . . , u„) and hence 
that span(0i, . . . , <p v ) C span(ui, . . . , u„). The reverse inclusion can be argued 
very similarly: by (4.50) 

Ui, . . . , u^_i G span(0i, . . . , (p v -i); (4.56) 

by (4.47) and (4.48) we can express u u as a linear combination of ((pi, ■ ■ ■ , (pv) 

v-l 

u v = \\vl v \\ 2 (p v + ^2(u v ,<p e )(pr, (4.57) 

e=i 

and (4.56) & (4.57) combine to prove that Ui, . . . , U„ € span(0i, . . . , <p v ) and hence 
that span(ui, . . . , u v ) C span((/>i, . . . , (p v ). D 
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By far the more important scenario for us is when Li does not contain a nonzero 
element of zero energy. This is because we shall mostly focus on signals that are 
bandlimited (see Chapter 6), and the only energy- limited signal that is bandlimited 
to WHz and that has zero-energy is the all-zero signal (Note 6.4.2). For subspaces 
not containing zero-energy signals other than the key properties to note about 
the signals d>i, . . . ,<pd produced by the Gram-Schmidt procedure are that they 
satisfy for each v S {1, . . . , d} 

span(ui, . . . , u„) = span(<£i, . . . , </>„) (4.58a) 

and 

(</>i, .... 0„) is an orthonormal basis for span(u!, . . . , u„). (4.58b) 

These properties are, of course, of greatest importance when v = d. 
We next provide an example of the Gram-Schmidt procedure. 

Example 4.6.11. Consider the following three signals: Ui : t v- » I{0 < t < 1}, 

u 2 : t i-> tl{0 <t<l}, and u 3 : t i-> t 2 I{0 < t < 1}. The tuple (ui,u 2 ,u 3 ) forms 
a basis for the subspace of all signals of the form t t— > p(t) I{0 < t < 1}, where p(-) 
is a polynomial of degree smaller than 3. To construct an orthonormal basis for 
this subspace with the Gram-Schmidt Procedure, we begin by normalizing ui. To 
that end, we compute 

/oo 
|I{0<t<l}| 2 dt=l 
-oo 

and set 4>\ = ui/ ||ui|| 2 , so 

0i : ii-» I{0 < t < 1}. 

The second function </> 2 is now obtained by normalizing u 2 
compute the inner product (u 2 ,</>i) 

/OO f'l 

I{0 < t < 1} il{0 < t < l}dt = / i( 
-oo JO 

to obtain that u 2 — (u 2 , d>i) d>\: 1 1— ► (t — 1/2) I{0 < t < 1}, which is of energy 

||u 2 - (u 2 ,0i)<£i|| s 

Hence, 

4> 2 : *i-> Vl2(t- -) l{0<i< 1}. (4.59b) 

The third function </>3 is the normalized version of U3 — (113, (j>i) 4>\ — (U3, 2 ) 2 . 
The inner products (u3,</»i) and (u3,0 2 ) are respectively 

r 1 1 

(u 3 ,0i)= / ^ 2 di= -, 
Jo ^ 



(u 2 , 


4>i)4>\- 


We first 


1 , 1 

tdt= - 

2 





f(- 


1\ 2 , 
-2") M = 


1 


12 



(113,02)=/ i 2 \/l2U-^j dt 



12 
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Consequently 

u 3 - (u 3 , 0i) 0i - (u 3 , 2 ) 02 : t •-> (t 2 - \ - (t - i) J I{0 < t < 1} 
with corresponding energy 

||U 3 - (U 3 , 0i> 01 - (U 3 , 2 > <fo\\l = J (f - t + I)' dt 



1\ 2 .. 1 

180 

Hence, the orthonormal basis is completed by the third function 

1 



(4.59c) 



4.7 The Space L 2 

Very informally one can describe the space L2 as the space of all energy-limited 
complex- valued signals, where we think of two signals as being different only if they 
are distinguishable. This section defines L2 more precisely. It can be skipped be- 
cause we shall have only little to do with L z . Understanding this space is, however, 
important for readers who wish to fully understand how the Fourier Transform is 
defined for energy-limited signals that are not integrable (Section 6.2.3). Readers 
who continue should recall from Section 2.5 that two energy-limited signals u and v 
are said to be indistinguishable if the set {f 6 I : u(t) 7^ v(t)} is of Lebesgue 
measure zero. We write u = v to indicate that u and v are indistinguishable. By 
Proposition 2.5.3, the condition u = v is equivalent to the condition ||u — v|| 2 = 0. 

To motivate the definition of the space Lg, we begin by noting that the space £2 
of energy-limited signals is "almost" an example of what mathematicians call an 
"inner product space," but it is not. The problem is that mathematicians insist 
that in an inner product space the only vector whose inner product with itself is 
zero be the zero vector. This is not the case in £ s : it is possible that u € £2 
satisfy (u, u) = (i.e., ||u||g = 0) and yet not be the all-zero signal 0. From the 
condition ||u||g = we can only infer that u is indistinguishable from 0. 

The fact that £2 is not an inner product space is an annoyance because it pre- 
cludes us from borrowing from the vast literature on inner product spaces (and 
Hilbert spaces, which are special kinds of inner product spaces), and because it 
does not allow us to view some of the results about £2 as instances of more gen- 
eral principles. For this reason mathematicians prefer to study the space Lg, which 
is an inner product space (and which is, in fact, a Hilbert space) rather than £g. 
Unfortunately, for this luxury they pay a certain price that I am loath to pay. 
Consequently, in most of this book I have decided to stick to £ s even though this 
precludes me from using the standard results on inner product spaces. The price 
one pays for using Lg will become apparent once we define it. 

To understand how Lg is constructed it is useful to note that the relation "u = v" , 
i.e., "u is indistinguishable from v" is an equivalence relation on £ 2 , i.e., it 
satisfies 

u = u, u e £2; (reflexive) 
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ue v O veu , u, v G £g ; (symmetric) 



u = v and v = w ) => I u = w ) , u, v, w G £g . (transitive) 



and 



Using these properties one can verify that if for every u G C 2 we define its equiv- 
alence class [u] as 

[u] 4 {u G C 2 : u = u}, (4.60) 

then two equivalence classes [u] and [v] must be either identical or disjoint. In 
fact, the sets [u] C C 2 and [v] C C 2 are identical if, and only if, u and v are 
indistinguishable 



[«] = [v]J ^(J|u-v|| 2 = oj, u,veA>, 

and they are disjoint if, and only if, u and v are distinguishable 
[u]fl[v] = 0) O (||u-v|| s >o), u,ve£j. 

We define Lg as the set of all such equivalence classes 

I 2 = {[u]:u£/; 2 }. (4.61) 

Thus, the elements of L 2 are not functions, but sets of functions. Each element 
of L 2 is an equivalence class, i.e., a set of the form [u] for some u££ 2 . And for 
each u G C 2 the equivalence class [u] is an element of L 2 . 

As we next show, the space L 2 can also be viewed as a vector space. To this end 
we need to first define "amplification of an equivalence class by a scalar a G C" and 
"superposition of two equivalence classes." How do we define the scaling-by-a of 
an equivalence class S G L 2 ? A natural approach is to find some function u G C 2 
such that S is its equivalence class (i.e., satisfying S = [u]), and to define the 
scaling-by-a of S as the equivalence class of cm, i.e., as [cm]. Thus we would define 
aS as the equivalence class of the signal t i— > au{i). While this turns out to be 
a good approach, the careful reader might be concerned by something. Suppose 
that S = [u] but that also S = [u]. Should aS be defined as the equivalence class 
of t t— > au(t) or of t i— > cra(t)? Fortunately, it does not matter because the two 
equivalence classes are the same! Indeed, if [u] = [u], then the equivalence class of 
1 1— > cre(t) is equal to the equivalence class of i i — >■ au{i) (because [u] = [u] implies 
that u and u agree except on a set of measure zero so au and an also agree except 
on a set of measure zero, which in turn implies that [au] = [cm] ) . 

Similarly, one can show that if <Si G L 2 and S2 G L 2 are two equivalence classes, 
then we can define their sum (or superposition) <Si + £2 as [ u i + u 2] where Ui 
is any function in C 2 such that S\ = [ui] and where U2 is any function in C 2 
such that £2 = [U2]. Again, to make sure that the result of the superposition of 
S\ and <S 2 does not depend on the choice of U! and u 2 we need to verify that if 
1S1 = [ui] = [ui] and if £2 = [112] = [ib] then [ui + 112] = [iii + U2]. This is not 
difficult but is omitted. 
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Using these definitions and by denning the zero vector to be the equivalence 
class [0], it is not difficult to show that L 2 forms a linear space over the com- 
plex field. To make it into an inner product space we need to define the inner 
product (<Si,<S 2 ) between two equivalence classes. If <Si = [ui] and if <S 2 = [ u 2] 
we define the inner product (<Si,<S 2 ) as the complex number (111,112). Again, we 
have to show that our definition is good in the sense that it does not depend on 
the particular choice of ui and u 2 . More specifically, we need to verify that if 
S\ = [ui] = [til] and if <S 2 = [u 2 ] = [u 2 ] then (111,112) = (iii,u 2 ). This can be 
proved as follows: 

(ui, u 2 ) = (ui + (ui - Ui),u 2 ) 

= (Ul,U 2 ) + (Ui - Ul,U 2 ) 

= (ui,u 2 ) 

= (ui,u 2 + (u 2 - u 2 )) 
= (ui,u 2 ) + (ui,u 2 - u 2 ) 
= (ui,u 2 ), 

where the third equality follows because [ui] = [iii] implies that ||ui — Ui||g = 
and hence that (ui — Ui,u 2 ) = (Cauchy-Schwarz Inequality), and where the 
last equality follows by a similar reasoning about u 2 and u 2 . Using the above 
definition of the inner product between equivalence classes one can show that if for 
some equivalence class S we have (S,S) = 0, then S is the zero vector, i.e., the 
equivalence class [0]. 

With these definitions of the scaling of an equivalence class by a scalar, the super- 
position of two equivalence classes, and the inner product between two equivalence 
classes, the space of equivalence classes L 2 becomes an inner product space in the 
sense that mathematicians like. In fact, it is a Hilbert space. 

What is the price we have to pay for working in an inner product space? It 
is that the elements of L 2 are not functions but equivalence classes and that it 
is meaningless to talk about the value they take at a given time. For example, 
it is meaningless to discuss the supremum (or maximum) of an element of L 2 . A 
To add to the confusion, mathematicians refer to elements of L 2 as "functions" 
(even though they are equivalence classes of functions), and they drop the square 
brackets. Things get even trickier when one deals with signals contaminated by 
noise. If one views the signals as elements of Lg, then the result of adding noise to 
them is not a stochastic process (Definition 12.2.1 ahead). We find this price too 
high, and in this book we shall mostly deal with C 2 . 



4.8 Additional Reading 

Most of the results of this chapter follows from basic results on inner product 
spaces and can be found, for example, in (Axler, 1997). However, since C 2 is not 
an inner-product space, we had to introduce some slight modifications. 



To deal with this, mathematicians define the essential supremum. 
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More on the definition of the space Lg can be found in most texts on analysis. See, 
for example, (Rudin, 1974, Chapter 3, Remark 3.10) and (Royden, 1988, Chapter 1 
Section 7). 

4.9 Exercises 

Exercise 4.1 (Linear Subspace). Consider the set of signals u of the form u : t >— > e - ' p(t), 
where p(-) is a polynomial whose degree does not exceed d. Is this a linear subspace of £2? 
If yes, find a basis for this subspace. 

Exercise 4.2 (Characterizing Infinite-Dimensional Subspaces). Recall that we say that a 
linear subspace is infinite dimensional if it is not of finite dimension. Show that a linear 
subspace U is infinite dimensional if, and only if, there exists a sequence Ui , U2 , . . . of 

elements of U such that for every 116N the tuple (ui , . . . , u n ) is linearly independent. 

Exercise 4.3 (£•> Is Infinite Dimensional). Show that £2 is infinite dimensional. 
Hint: Exercises 4.1 and 4-2 may be useful. 

Exercise 4.4 (Separation between Signals). Given Ui,u 2 £ £2, let V be the set of all 

complex signals v that are equidistant to ui and u 2 : 

V = {v e C 2 ■ ||v - m|| 2 = ||v - u 2 || a }- 
(i) Show that 

V = |ve/:2:Ref(v,u 2 -u 1 ^- l|U2|l '- ||Ul11 ' 

(ii) Is V a linear subspace of £2? 
(iii) Show that (ui + u 2 )/2 e V. 

Exercise 4.5 (Projecting a Signal). Let u £ C2 be of positive energy, and let v £ £j be 

arbitrary. 

(i) Show that Definitions 4.6.6 and 4.5.3 agree in the sense that the projection of v 
onto span(u) (according to Definition 4.6.6) is the same as the projection of v onto 
the signal u (according to Definition 4.5.3). 

(ii) Show that if the signal u is an element of a finite-dimensional subspace U having 
an orthonormal basis, then the projection of u onto U is given by u. 

Exercise 4.6 (Orthogonal Subspace). Given signals Vi, . . . , v n £ C2, define the set 

U = {u e C 2 : (u,Vi) = (u, v 2 ) = • •• = (u, v n } = 0}. 
Show that W is a linear subspace of £.2 ■ 

Exercise 4.7 (Constructing an Orthonormal Basis). Let T s be a positive constant. Con- 
sider the signals si : t i-> I{0 < t < T s /2} - I{T s /2 < t < T s }; s 2 : t ^ I{0 < t < T s }; 
s 3 : t ^ I{0 < t < T s /4} + I{3T s /4 < t < T s }; and s 4 :t^I{0<t< T s /4} - I{3T s /4 < 
t < T 8 }. 
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(i) Plot si, S2, S3, and S4. 

(ii) Find an orthonormal basis for span (si, S2, S3, S4). 

(iii) Express each of the signals si, s 2 , s 3 , and s 4 as a linear combination of the basis 
vectors found in Part (ii). 

Exercise 4.8 (Is the /la-Limit Unique?). Show that for signals £,xi,X2,... in Cg the 
statement 

lim ||x„ - £|| = 

n — >oo 

is equivalent to the statement 

lim||x n -C|| =0) «■ (ce[C] 

Exercise 4.9 (Signals of Zero Energy). Given Vi,...,v n £ £2, show that there exist 
integers 1 < V\ < v 2 < ■ ■ ■ < Vd < n such that the following three conditions hold: 
the d-tuple (y ui , . . . ,v„ d ) is linearly independent; span(v I/1 , . . . , v„ d ) contains no signal 
of zero energy other than the all-zero signal 0; and each element of span(vi, . . . ,v„) is 
indistinguishable from some element of span(v I/1 , . . . , v Vd ). 

Exercise 4.10 (Orthogonal Subspace). Given Vi, . . . , v n £ Cz, define the set 

U = {u e C 2 ■ (u,Vi) = (u,v 2 > = ••• = (u,v„) = 0}, 
and the set of all energy-limited signals that are orthogonal to all the signals in U: 
W ± = |we£ 2 : «w,u) = 0, ueW)}. 

(i) Show that U ± is a linear subspace of C% . 

(ii) Show that an energy-limited signal is in U if, and only if, it is indistinguishable 
from some element of span(vi , . . . , v n ). 

Hint: For Part (ii) you may find Exercise 4-9 useful. 

Exercise 4.11 (More on Indistinguishability). Given Vi, . . . ,v n £ Cg and some w £ Lg, 
propose an algorithm to check whether there exists an element of span(vi, . . . , v n ) that 
is indistinguishable from w. 

Hint: Exercise 4-9 may be useful. 



Chapter 5 

Convolutions and Filters 

5.1 Introduction 

Convolutions play a central role in the analysis of linear systems, and it is thus 
not surprising that they will appear repeatedly in this book. Most of the readers 
have probably seen the definition and key properties in an earlier course on linear 
systems, so this chapter can be viewed as a very short review. New perhaps is 
the following section on notation and the all-important Section 5.8 on the matched 
filter and its use in calculating inner products. 

5.2 Time Shifts and Reflections 

Suppose that x: M. —* M. is a real signal, where we think of the argument as being 
time. Such functions are typically plotted on paper with the time arrow pointing 
to the right. Take a moment to plot an example of such a function, and on the 
same coordinates plot the function 

t 1— > x(t — to), 

which maps every ( G i to x(t — t ) for some positive to- Repeat with to being 
negative. This may seem like a mindless exercise but there is a point to it. It 
will help you understand convolutions graphically and help you visualize mappings 
such as t t— > ~^2 e OLg 9(t — £T S ), which we will encounter later in our study of Pulse 
Amplitude Modulation (PAM). It will also help you visualize the matched filter. 

Given a complex signal x: R — > C, we denote its reflection or mirror image 

by x: 

x:th-n(-t). (5.1) 

Its plot is the mirror image of the plot of x(-) about the vertical axis. The mirror 
image of the mirror image of x is x. 

53 
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5.3 The Convolution Expression 

The convolution x * h between two complex signals x : R — > C and h : R — > C is 
formally denned as the complex signal whose time-t value (x*h)(t) is given by 



(x*h)(i)=/ x{ T )h{t-T)dT. (5.2) 

J — oo 

Note that the integrand in the above is complex. (See Section 2.3 for a discussion 
of such integrals.) This definition also holds for real signals. 

We used the term "formally defined" because certain conditions need to be met 
for this integral to be defined. It is conceivable that for some i G R the integrand 
t i— > x(t) h{t — r) will not be integrable, so the integral will be undefined. (Recall 
that in this book we only allow integrals of the form f_ g(t) dt if the integrand 

g(-) is in £j so J_ \g(t)\ dt < oo. Otherwise, we say that the integral J_ g(t) dt 
is undefined.) We thus say that x * h is defined at t G R if r i— > x(t) h{t — r) is 
integrable. 

While (5.2) does not make it apparent, the convolution is in fact symmetric in x 
and h. Thus, the integral in (5.2) is defined for a given t if, and only if, the integral 

h{a) x(t - a) da (5.3) 

is defined. And if both are defined, then their values are identical. This follows 
directly by the change of variable a = t — t. 

5.4 Thinking About the Convolution 

Depending on the application, we can think about the convolution operation in a 
number of different ways. 

(i) Especially when h(-) is nonnegative and integrates to one, one can think of 
the convolution as an averaging, or smoothing, operation. Thus, when x is 
convolved with h the result at time to is not x{to) but rather a smoothed 
version thereof, namely, J_ x(to — r) /i(t) dr. For example, if h is the map- 
ping t i—* l{\t\ < T/2}/T for some T > 0, then the convolution x * h at time 
to is not a; (to) but rather 

1 rto+T/2 

- / x(t) dr. 

1 Jto-T/2 

Thus, in this example, we can think of x*h as being a "moving average," or 
a "sliding- window average" of x. 

(ii) For energy-limited signals it is sometimes beneficial to think about (x*h)(to) 
as the inner product between the functions r i— > x(t) and r i— > h*(to — t): 

(x * h)(t ) = <r .-> x{t),t h-> h*(t - r)>. (5.4) 
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(iii) Another useful informal way is to think about x * h as a limit of expressions 
of the form 

$>&)*(*-*,•)> (5-5) 

3 

i.e., as a limit of linear combinations of the time shifts of x where the coeffi- 
cients are determined by h. 

5.5 When Is the Convolution Defined? 

There are a number of useful theorems providing sufficient conditions for the con- 
volution's existence. These theorems can be classified into two kinds: those that 
guarantee that the convolution x * h is defined at every epoch t 6 I and those 
that only guarantee that the convolution is defined for all epochs t outside a set of 
Lebesgue measure zero. Both types are useful. We begin with the former. 

Convolution defined for every tel: 

(i) A particularly simple case where the convolution is defined at every time 
instant t is when both x and h are energy- limited: 

x,he £ 2 . (5.6a) 

In this case we can use (5.4) and the Cauchy-Schwarz Inequality (Theo- 
rem 3.3.1) to conclude that the integral in (5.2) is defined for every t £ 1 
and that x • h is a bounded function with 

|(x*h)(t)|<||x|| s ||h|| £) teR. (5.6b) 

Indeed, 

|(x*h)(i)| = \(t h-» x(t),t i-> h*{t-r))\ 

<\\t^x(t)\\ 2 \\ T ^h*(t-T)\\ 2 

= l|x|| 2 ||h|| 2 . 

In fact, it can be shown that the result of convolving two energy-limited 
signals is not only bounded but also uniformly continuous. 1 (See, for example, 
(Adams and Fournier, 2003, Paragraph 2.23).) 

Note that even if both x and h are of finite energy, the convolution x • h 
need not be. However, if x, h are both of finite energy and if one of them 
is additionally also integrable, then the convolution x * h is a finite energy 
signal. Indeed, 

||x*h|| g < ||h||J|x|| 2 , hednCg, *eC 2 . (5.7) 

For a proof see, for example, (Rudin, 1974, Chapter 7, Exercise 4) or (Stein 
and Weiss, 1990, Chapter 1, Section 1, Theorem 1.3). 



X A function s: K — > C is said to be uniformly continuous if for every e > there corresponds 
some positive <5(e) such that |s(£') — s(£")| is smaller than e whenever £',£" S R are such that 
W-C\<S(e)- 
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(ii) Another simple case where the convolution is defined at every epoch t G M. is 
when one of the functions is measurable and bounded and when the other is 
integrable. For example, if 

h <= Ci (5.8a) 

and if x is a Lebesgue measurable function that is bounded in the sense that 

\x(t)\ < (Too, teR (5.8b) 

for some constant <7oo, then for every i€K the integrand in (5.3) is integrable 
because \h(a)x(t — <r)\ < \h(a)\ <Joq, with the latter being integrable by our 
assumption that h is integrable. The result of the convolution is a bounded 
function because 

|(x*h)(i)| 





/>CO 


= 


I h{T)x(t-T)dT 




J —oo 


f'OQ 


< / \h(T)x(t-T)\dT 


J — CO 


< ( 


TooWHi, ten, 



(5.8c) 

where the first inequality follows from Proposition 2.4.1, and where the second 
inequality follows from (5.8b). 

For this case too one can show that the result of the convolution is not only 
bounded but also uniformly continuous. 

(iii) Using Holder's Inequality, we can generalize the above two cases to show 
that whenever x and h satisfy the assumptions of Holder's Inequality, their 
convolution is defined at every epoch ( £ M and is, in fact, a bounded uni- 
formly continuous function. See, for example, (Adams and Fournier, 2003, 
Paragraph 2.23). 

(iv) Another important case where the convolution is defined at every time instant 
will be discussed in Proposition 6.2.5. There it is shown that the convolution 
between an integrable function (of time) with the Inverse Fourier Transform 
of an integrable function (of frequency) is defined at every time instant and 
has a simple representation. This scenario is not as contrived as the reader 
might suspect. It arises quite naturally, for example, when discussing the 
lowpass filtering of an integrable signal (Section 6.4.2). The impulse response 
of an ideal lowpass filter (LPF) is not integrable, but it can be represented 
as the Inverse Fourier Transform of an integrable function; see (6.35). 

Regarding theorems that guarantee that the convolution be defined for every t 
outside a set of Lebesgue measure zero, we mention two. 

Convolution defined for t outside a set of Lebesgue measure zero: 

(i) If both x and h are integrable, then one can show (see, for example, (Rudin, 
1974, Theorem 7.14), (Katznelson, 1976, Section VI. 1), or (Stein and Weiss, 
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1990, Chapter 1, Section 1, Theorem 1.3)) that, for all t outside a set of 
Lebesgue measure zero, the mapping t <— » x(r)h(t — r) is integrable, so for 
all such t the function (x*h)(£) is defined. Moreover, irrespective of how we 
define (x*h)(£) for t inside the set of Lebesgue measure zero 

||x*h||, < llxH, Uhll,, x,he£,. (5.9) 

What is nice about this case is that the result of the convolution stays in 
the same class of integrable functions. This makes it meaningful to discuss 
associativity and other important properties of the convolution. 

(ii) Another case where the convolution is defined for all t outside a set of 
Lebesgue measure zero is when h is integrable and when x is a measur- 
able function for which r i— > |x(r)| p is integrable for some 1 < p < oo. In 
this case we have (see, for example, (Rudin, 1974, Exercise 7.4) or (Stein and 
Weiss, 1990, Chapter 1, Section 1, Theorem 1.3)) that for all t outside a set 
of Lebesgue measure zero the mapping r <—>■ x(r)h(t — r) is integrable so for 
such t the convolution (x*h)(£) is well-defined. Moreover, irrespective of 
how we define (x * h)(t) for t inside the set of Lebesgue measure zero 

/ f-oo \ l/p / /-00 \ i/p 

(J_J(x*h)(i)| P dij fCHhll, [j_Jx(t)\ p dt) . (5.10) 

This is written more compactly as 

llx^hll^Uhll, ||x|| p , p>l, (5.11) 

where we use the notation that for any measurable function g and p > 

\g(t)\ p dt) \ (5.12) 



<p 



5.6 Basic Properties of the Convolution 

The main properties of the convolution are summarized in the following theorem. 
Theorem 5.6.1 (Properties of the Convolution). The convolution is 

x * h = h * x, (commutative) 

(x * g) * h = x * (g * h) , (associative) 

x * (g + h) =x*g + x*h, (distributive) 

and linear in each of its arguments 

x * (ag + /3h) = a(x * g) + /3(x * h) 
(ag + j3h) * x = a(g * x) + /3(h • x) , 

where the above hold for all g, h, x € C ± , and a, (5 € C. 

Some of these properties hold under more general or different sets of assumptions 
so the reader should focus here on the properties rather than on the restrictions. 
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5.7 Filters 

A filter of impulse response h is a physical device that when fed the input 
waveform x produces the output waveform h * x. The impulse response h is 
assumed to be a real or complex signal, and it is tacitly assumed that we only feed 
the device with inputs x for which the convolution x * h is defined. 2 

Definition 5.7.1 (Stable Filter). A filter is said to be stable if its impulse response 
is integrable. 

Stable filters are also called bounded-input/bounded-output stable or BIBO 

stable, because, as the next proposition shows, if such filters are fed a bounded 
signal, then their output is also a bounded signal. 

Proposition 5.7.2 (BIBO Stability). // h is integrable and if x is a bounded 
Lebesgue measurable signal, then the signal x * h is also bounded. 

Proof. If the impulse response h is integrable, and if the input x is bounded by 
some constant a^, then (5.8a) and (5.8b) are both satisfied, and the boundedness 
of the output then follows from (5.8c). □ 

Definition 5.7.3 (Causal Filter). A filter of impulse response h is said to be causal 
or nonanticipative ifh is zero at negative times, i.e., if 

h(i)=0, £<0. (5.13) 

Causal filters play an important role in engineering because (5.13) guarantees that 
the present filter output be computable from the past filter inputs. Indeed, the 
time-t filter output can be expressed in the form 

/oo 
x(t) h(t - t) dr 

ft 

x{t) h{t — t) dr, h causal, 

-oo 

where the calculation of the latter integral only requires knowledge of x{t) for 
t < t. Here the first equality follows from the definition of the convolution (5.2), 
and the second equality follows from (5.13). 

5.8 The Matched Filter 

In Digital Communications inner products are often computed using a matched 
filter. In its definition we shall use the notation (5.1). 



2 This definition of a filter is reminiscent of the concept of a "linear time invariant system." 
Note, however, that since we do not deal with Dirac's Delta in this book, our definition is more 
restrictive. For example, a device that produces at its output a waveform that is identical to its 
input is excluded from our discussion here because we do not allow h to be Dirac's Delta. 
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Definition 5.8.1 (The Matched Filter). The matched filter for the signal <fr is 
a filter whose impulse response is <p* , i.e., the mapping 

t^4>*{-t). (5.14) 

The main use of the matched filter is for computing inner products: 

Theorem 5.8.2 (Computing Inner Products with a Matched Filter). The inner 
product (u, 4>) between the energy-limited signals u and <p is given by the output at 
time t = of a matched filter for <f> that is fed u: 



(u,0) = (u*0*)(O), uieA 



(5.15) 



More generally, i/g: t i— > cf>(t — to), then (u,g) is the time-to output corresponding 
to feeding the waveform u to the matched filter for (p: 



oc 

II 

— oo 



(t)0*(t-to)dt=(u*£*)(t o ). (5.16) 



Proof. We shall prove the second part of the theorem, i.e., (5.16); the first follows 
from the second by setting to = 0. We express the time-to output of the matched 
filter as: 



(u*<t>*)(to)= I u(t)</>*(£ -r)dr 
u(t)4>*(t - i )dr, 



where the first equality follows from the definition of convolution (5.2) and the 
second from the definition of (p* as the conjugated mirror image of <p. □ 

From the above theorem we see that if we wish to compute, say, the three inner 
products (u, gi), (u,g 2 ), and (u,g 3 ) in the very special case where the functions 
g 1; g 2 ,g3 are all time shifts of the same waveform 0, i.e., when gi : i i — » <f)(t — t{), 
g2 : t i — > 4>(t — £2), and g3 : t 1— > <p(t — £3), then we need only one filter, namely, the 
matched filter for <p. Indeed, we can feed u to the matched filter for <p and the 
inner products (u,gi), (u,g2), and (u,g3) simply correspond to the filter's outputs 
at times t\, £2, and £3. One circuit computes all three inner products. This is so 
exciting that it is worth repeating: 

Corollary 5.8.3 (Computing Many Inner Products using One Filter). // the 

energy -limited signals {gj}' =1 are all time shifts of the same signal <p in the sense 
that 

gj:ti->(f>(t-tj), j = l,...,J, 

and if u is any energy -limited signal, then all J inner products 
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can be computed using one filter by feeding u to a matched filter for <p and sampling 
the output at the appropriate times ti,. ..,tj: 

(u, gj )=(u *<£*)(*,), j=l,...J. (5.17) 



5.9 The Ideal Unit-Gain Lowpass Filter 

The impulse response of the ideal unit-gain lowpass filter of cutoff frequency W c 
is denoted by LPFvy c ( - ) and is given for every W c > by 



:s 



fow sin ( 27rW c*) ;f to, n 

LPF w (t) = < 2 " Wc * t£t. (5.18) 

cW [2W C if t = 0, y ' 

This can be alternatively written as 

LPF Wc (i) = 2W c sinc(2W c i), teM., (5.19) 

where the function sinc(-) is defined by 4 

I" sin«) if >. / Q 

sinc(e) =1 *« ^ ' ?£R. (5.20) 

' [1 if ^ = 0, 

Notice that the definition of sinc(0) as being 1 makes sense because, for very small 
(but nonzero) values of £ the value of sin(£)/£ is approximately 1. In fact, with 
this definition at zero the function is not only continuous at zero but also infinitely 
differentiable there. Indeed, the function from C to C 



i(^ 



if z + 0, 



I 1 otherwise, 

is an entire function, i.e., an analytic function throughout the complex plane. 

The importance of the ideal unit-gain lowpass filter will become clearer when we 
discuss the filter's frequency response in Section 6.3. It is thus named because 
the Fourier Transform of LPFw c ( - ) is equal to 1 (hence "unit gain"), whenever 
|/| 5= W c , and is equal to zero, whenever |/| > W c . See (6.38) ahead. 

From a mathematical point of view, working with the ideal unit-gain lowpass filter 
is tricky because the impulse response (5.18) is not an integrable function. (It 
decays like l/t, which does not have a finite integral from t = 1 to t = oo.) This 
filter is thus not a stable filter. We shall revisit this issue in Section 6.4. Note, 
however, that the impulse response (5.18) is of finite energy. (The square of the 
impulse response decays like l/t 2 which does have a finite integral from one to 
infinity.) Consequently, the result of feeding an energy-limited signal to the ideal 
unit-gain lowpass filter is always well-defined. 

Note also that the ideal unit-gain lowpass filter is not causal. 



3 For convenience we define the impulse response of the ideal unit-gain lowpass filter of cutoff 
frequency zero as the all zero signal. This is in agreement with (5.19). 

4 Some texts omit the 7r's in (5.20) and define the sinc(-) function as sin(§)/£ for £ ^ 0. 
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5.10 The Ideal Unit-Gain Bandpass Filter 

The ideal unit-gain bandpass filter (BPF) of bandwidth W around the carrier 
frequency f c , where f c > W/2 > is a filter of impulse response BPFw,/ c (-), 
where 

BPF Wi/c (i) = 2Wcos(27r/ c t)sinc(Wt), ief. (5.21) 

This filter too is nonstable and noncausal. It derives its name from its frequency 
response (discussed in Section 6.3 ahead), which is equal to one at frequencies / 
satisfying |/| — / c < W/2 and which is equal to zero at all other frequencies. 



5.11 Young's Inequality 

Many of the inequalities regarding convolutions are special cases of a result known 
as Young's Inequality. Recalling (5.12), we can state Young's Inequality as follows. 

Theorem 5.11.1 (Young's Inequality). Let x and h be measurable functions such 
that ||x|| ,||h|| < oo for some 1 < p, q < oo satisfying l/p + 1/q > 1. Define r 
through l/p + 1/q =1 + 1/r. Then the convolution integral (5.2) is defined for all t 
outside a set of Lebesgue measure zero; it is a measurable function; and 

||x*h|| r <K||x|| p ||h|| g , (5.22) 

where K < 1 is some constant that depends only on p and q. 

Proof. See (Adams and Fournier, 2003, Corollary 2.25). Alternatively, see (Stein 
and Weiss, 1990, Chapter 5, Section 1) where it is derived from the M. Riesz 
Convexity Theorem. □ 



5.12 Additional Reading 

For some of the properties of the convolution and its use in the analysis of linear 
systems see (Oppenheim and Willsky, 1997) and (Kwakernaak and Sivan, 1991). 



5.13 Exercises 

Exercise 5.1 (Convolution of Delayed Signals). Let x and h be energy-limited signals. 
Let Xd : 1 1— » x(t — id) be the result of delaying x by some id £ R- Show that 

(x d *h)(i) = (x*h)(i- i d ), i £ R. 

Exercise 5.2 (The Convolution of Reflections). Let the signals x, y be such that their 
convolution (x • y)(i) is defined at every t 6 R. Show that the convolution of their 
reflections is also defined at every i £ R and that it is equal to the reflection of their 
convolution: 

(x*y)(i) = (x*y)(-i), t£R. 
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Exercise 5.3 (Convolving Brickwall Functions). For a given a > 0, compute the convolu- 
tion of the signal t \— > I{|i| < a} with itself. 

Exercise 5.4 (The Convolution and Inner Products). Let y and cb be energy-limited 
complex signals, and let h be an integrable complex signal. Argue that 

y,h*0) = (y*h*,</> 



Exercise 5.5 (The Convolution's Derivative). Let the signal g: R — » C be differentiable, 
and let g' denote its derivative. Let h: R — » C be another signal. Assume that g, g', 
and h are all bounded, continuous, and integrable. Show that g • h is differentiable and 
that its derivative (g • h)' is given by g' * h. 

See (Korner, 1988, Chapter 53, Theorem 53.1). 

Exercise 5.6 (Continuity of the Convolution). Show that if the signals x and y are both 
in C.2 then their convolution is a continuous function. 

Hint: Use the Cauchy-Schwarz Inequality and the fact that if x £ £2 and if we define 

x.s '■ t 1— » x(t — S), then lim llx — XxIL =0. 
j_, 11 11-; 

Exercise 5.7 (More on the Continuity of the Convolution). Let x and y be in £2- Let the 

sequence of energy-limited signals xi,X2, . . . converge to x in the sense that ||x — x n || g 
tends to zero as n tends to infinity. Show that at every epoch t G R, 

lim (x„*y)(i) = (x*y)(£). 

Hint: Use the Cauchy-Schwarz Inequality 

Exercise 5.8 (Convolving Bi-lnfinite Sequences). The convolution of the bi-infinite se- 
quence . . . , 0-1 , Oo, Oi ■ ■ ■ with the bi-infinite sequence . . . , b-i , bo , 61 • • • is the bi-infinite 
sequence . . . , c_i, Co, Ci ... formally defined by 

00 
c m = Yl a » b ™-»> m£Z. (5.23) 

^ = — 00 

Show that if 

OO QG 

J2 M' J2 i fei -i <o °' 

U = — QC ^ = — CO 

then the sum on the RHS of (5.23) converges for every integer ra, and 

CO , OC v / OO 

E i c ™i<( E kiH E 1^1 

m= — do ^ = — 00 f=— oc 

Hint: Recall Problems 3.10 & 3.9 and the Triangle Inequality for Complex Numbers. 

Exercise 5.9 (Stability of the Matched Filter). Let g be an energy-limited signal. Under 
what conditions is the matched filter for g stable? 
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Exercise 5.10 (Causality of the Matched Filter). Let g be an energy-limited signal. 

(i) Under what conditions is the matched filter for g causal? 

(ii) Under what conditions can you find a causal filter of impulse response h and a 
sampling time to such that 

(r*h)(i ) = <r,g), reW 

(iii) Show that for every 5 > we can find a stable causal filter of impulse response h 
and a sampling epoch to such that for every r £ C2 

|(r*h)(to)-(r,g>|<*||r|| s . 

Exercise 5.11 (The Output of the Matched Filter). Compute and plot the output of the 
matched filter for the signal t 1— > e - ' l{t > 0} when it is fed the input t \— > I{|£| < 1/2}. 



Chapter 6 

The Frequency Response of Filters and 
Bandlimited Signals 



6.1 Introduction 

We begin this chapter with a review of the Fourier Transform and its key properties. 
We then use these properties to define the frequency response of filters, to discuss 
the ideal unit-gain lowpass filter, and to define bandlimited signals. 



6.2 Review of the Fourier Transform 

6.2.1 On Hats, 2-k's, u's, and f's 

We denote the Fourier Transform (FT) of a (possibly complex) signal x(-) by 
£(•). Some other books denote it by X(-), but we prefer our notation because, 
where possible, we use lowercase letters for deterministic quantities and reserve 
uppercase letters for random quantities. In places where convention forces us to 
use uppercase letters for deterministic quantities, we try to use a special font, e.g., 
P for power, W for bandwidth, or A for a deterministic matrix. 

More importantly, our definition of the Fourier Transform may be different from 
the one you are used to. 

Definition 6.2.1 (Fourier Transform). The Fourier Transform (or the Cj- 
Fourier Transform) of an integrable signal x : K — > C is the mapping x : K — > C 
defined by 

/•OO 

x: /i-» / x{t)e-' a * ft dt. (6.1) 



(The FT can also be defined in more general settings. For example, in Section 6.2.3 
it will be defined via a limiting argument for finite-energy signals that are not 
integrable.) 
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This definition should be contrasted with the definition 

f'OO 

X(\w) = / x{t)e-' ,UJt dt, (6.2) 



which you may have seen before. Note the 2ir, which appears in the exponent in 
our definition (6.1) and not in (6.2). We apologize to readers who are used to (6.2) 
for forcing a new definition, but we have some good reasons: 

(i) With our definition, the transform and its inverse are very similar; see (6.1) 
and (6.4) below. If one uses the definition of (6.2), then the expression for 
the Inverse Fourier Transform requires scaling the integral by l/(2ir). 

(ii) With our definition, the Fourier Transform and the Inverse Fourier Trans- 
form of a symmetric function are the same; see (6.6). This simplifies the 
memorization of some Fourier pairs. 

(iii) As we shall state more precisely in Section 6.2.2 and Section 6.2.3, with our 
definition the Fourier Transform possesses an extremely important property: 
it preserves inner products 

(u,v) = (u, v) (certain restrictions apply). 

Again, no 27r's. 

(iv) If x(-) models a function of time, then x(-) becomes a function of frequency. 
Thus, it is natural to use the generic argument t for such signals x(-) and the 
generic argument / for their transforms. It is more common these days to 
describe tones in terms of their frequencies (i.e., in Hz) and not in terms of 
their radial frequency (in radians per second). 

(v) It seems that all books on communications use our definition, perhaps because 
people are used to setting their radios in Hz, kHz, or MHz. 

Plotting the FT of a signal is tricky, because it is a complex-valued function. This 
is generally true even for real signals. However, for any integrable real signal 
x: M. — > R the Fourier Transform x(-) is conjugate-symmetric, i.e., 

(x{-f) = x*{f), /el], x e d is real- valued. (6.3) 

Equivalently, the magnitude of the FT of an integrable real signal is symmetric, and 
the argument is anti-symmetric. 1 (The reverse statement is "essentially" correct. 
If x is conjugate-symmetric then the set of epochs t for which x(t) is not real is 
of Lebesgue measure zero.) Consequently, when plotting the FT of a "generic" 
real signal we shall plot a symmetric function, but with solid lines for the positive 
frequencies and dashed lines for the negative frequencies. This is to remind the 
reader that the FT of a real signal is not symmetric but conjugate symmetric. See, 
for example, Figures 7.1 and 7.2 for plots of the Fourier Transforms of real signals. 



x The argument of a nonzero complex number z is defined as the element 8 of [— tt,tt) such 
that z = \z\e w . 
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When plotting the FT of a complex- valued signal, we shall use a generic plot that 
is "highly asymmetric," using solid lines. See, for example, Figure 7.4 for the FT 
of a complex signal. 

Definition 6.2.2 (Inverse Fourier Transform). The Inverse Fourier Transform 

(IFT) of an integrable function g: K — » C is denoted by g and is defined by 

S-t^ f°° 9(f) e i2x/ * d/. (6.4) 



We emphasize that the word "inverse" here is just part of the name of the transform. 
Applying the IFT to the FT of a signal does not always recover the signal. 2 (Condi- 
tions under which the IFT does recover the signal are explored in Theorem 6.2.13.) 
However, if one does not insist on using the IFT, then every integrable signal can 
be reconstructed to within indistinguishability from its FT; see Theorem 6.2.12. 

Proposition 6.2.3 (Some Properties of the Inverse Fourier Transform). 

(i) If g is integrable, then its IFT is the FT of its mirror image 

g = g, ge£). (6.5) 

(ii) If g is integrable and also symmetric in the sense that g = g, then the IFT 
of g is equal to its FT 

g = g, (g € £i and g = g) . (6.6) 

(Hi) If g is integrable and g is also integrable, then 



(6.7) 



Proof. Part (i) follows by a simple change of integration variable: 

/>CO p — CO 

g(0 = I g(a) e i2 ™« da = - / g(-(i) er a ^ d/3 

= 9(0, £eK, 

where we have changed the integration variable to (3 = — a. 



2 This can be seen by considering the signal t t-^ I{t = 17}, which is zero everywhere except 
at 17 where it takes on the value 1. Its FT is zero at all frequencies, but if one applies the IFT to 
the all-zero function one obtains the all-zero function, which is not the function we started with. 
Things could be much worse. The FT of some integrable signals (such as the signal t i— > I{|t| < 1}) 
is not integrable, so the IFT of their FT is not even defined. 
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Part (ii) is a special case of Part (i). To prove Part (iii) we compute 

f oo / /*oo \ 

H0= I [ I giDe'^dfje-'^dt 



oo \J — oo 



g(-t)e-' 2 ^dt 
g(r) e' 2lT ^ dr 

= g(0, CeK, 

where we have changed the integration variable to r = — t. 



□ 



Identity (6.6) will be useful in Section 6.2.5 when we memorize the FT of the 
Brickwall function £ i— > /3I{|£| < 7}, which is symmetric. Once we succeed we will 
also know its IFT. 

Table 6.1 summarizes some of the properties of the FT. Note that some of these 
properties require additional technical assumptions. 



Property 


Function 


Fourier Transform 


linearity 


ax + /3y 


ax + /3y 


time shifting 


t <— > x(t — to) 


/ ,_> e -i27r/to £( J) 


frequency shifting 


t h-> e'^fot x ^ 


f^x(f-fo) 


conjugation 


1 1— > x*(t) 


/-**(-/) 


stretching (a£l, a/0) 


t 1— * x(at) 


J |a| v a ' 


convolution in time 


x*y 


f»Hf)v(f) 


multiplication in time 


t~x(t)y(t) 


x*y 


real part 


t v-> Re(x(*)) 


/^|x(/) + |r(-/) 


time reflection 


X 


X 


transforming twice 


X 


X 


FT of IFT 


X 


X 



Table 6.1: Basic properties of the Fourier Transform. Some restrictions apply! 



6.2.2 Parseval-like Theorems 

A key result on the Fourier Transform is that, subject to some restrictions, it pre- 
serves inner products. Thus, if xi and X2 are the Fourier Transforms of xi and X2, 
then the inner product (xi,X2) between xi and X2 is typically equal to the inner 
product (xi,X2) between their transforms. In this section we shall describe two 
scenarios where this holds. A third scenario, which is described in Theorem 6.2.9, 
will have to wait until we discuss the FT of signals that are energy-limited but not 
integrable. 
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To see how the next proposition is related to the preservation of the inner product 
under the Fourier Transform, think about g as being a function of frequency and 
of its IFT g as a function of time. 

Proposition 6.2.4. //g: / i— » g(f) andx.: 1 1— > x(t) are integrable mappings from R 
to C, then 



x(t)g*(t)dt = x(f)g*(f)df, (6.8) 

J — oo J— oo 

i.e., 

<x,g) = (x,g), g,xe£j. (6.9) 

Proof. The key to the proof is to use Fubini's Theorem to justify changing the 
order of integration in the following calculation: 

/OO / /*00 \ * 

*(*)(/ 5(/)e i2,r/ 'd/) dt 
-oo \J — oo / 

/oo 
5 *(/)e- i2 ^ 4 d/dt 
-oo 
/oo 
x(i)e- iMi dtd/ 
-oo 

<?*(/) z(/)d/, 



where the first equality follows from the definition of g; the second because the 
conjugation of an integral is accomplished by conjugating the integrand (Proposi- 
tion 2.3.1); the third by changing the order of integration; and the final equality 
by the definition of the FT of x. □ 

A related result is that the convolution of an integrable function with the IFT of 
an integrable function is always defined: 

Proposition 6.2.5. If the mappings x: 1 1— > x(t) and g: / i— » g(f) from R to C are 
both integrable, then the convolution x * g is defined at every epoch teK and 



(x*g)(i)=/ g(/)x(/)e i2 ^*d/, teR. (6.10) 

J — OO 

Proof. Here too the key is in changing the order of integration: 

/■CO 

(x*g)(i)= / x(T)g(t-T)dT 

) 

/OO 
e i2^/(t-r) 5(/)d/dT 
-oo 

/oo 
x( T )e-^^ drd/ 
-oo 
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where the first equality follows from the definition of the convolution; the second 
from the definition of the IFT; the third by changing the order of integration; and 
the final equality by the definition of the FT. The justification of the changing of the 
order of integration can be argued using Fubini's Theorem because, by assumption, 
both g and x are integrable. □ 

We next present another useful version of the preservation of inner products under 
the FT. It is useful for functions (of time) that are zero outside some interval 
[— T, T] or for the IFT of functions (of frequency) that are zero outside an interval 

{-W,W}. 

Proposition 6.2.6 (A Mini Parseval Theorem). 

(i) Let the signals xi and X2 be given by 

*„(*)=/ g v (f)e i2 * ft df, (t€R,u=l,2), (6.11a) 

J — CO 

where the functions g^: / i— » g^(f) satisfy 

9u(f)=0, (|/|>W, i/=l,2), (6.11b) 

for some W > 0, and 

/CO 
\g v (f)\ 2 df < oo, v =1,2. (6.11c) 

-CO 

Then 

<x 1 ,x 2 ) = (g 1 ,g 2 >. (6.11d) 

(ii) Let gi and g 2 be given by 

9vU)= I x v (t)e- a "t*dt, (/el,i/ = l,2), (6.12a) 

where the signals xi, x 2 € C2 are such that for some T > 

x v (t)=0, (\t\ >T, i/ = l,2V (6.12b) 

Then 

(x 1 ,x 2 ) = (g 1 ,g 2 >. (6.12c) 

Proof. See the proof of Lemma A. 3. 6 on Page 693 and its corollary in the appendix. 

□ 
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6.2.3 The L^-Fourier Transform 

To appreciate some of the mathematical subtleties of this section, the reader is 
encouraged to review Section 4.7 in order to recall the difference between the 
space L<2 and the space £ 2 and in order to recall the difference between an energy- 
limited signal x € C2 and the equivalence class [x] £ L2 to which it belongs. In this 
section we shall sketch how the Fourier Transform is defined for elements of Lg. 
This section can be skipped provided that you are willing to take on faith that 
such a transform exists and that, very roughly speaking, it has some of the same 
properties of the Fourier Transform of Definition 6.2.1. To differentiate between 
the transform of Definition 6.2.1 and the transform that we are about to define 
for elements of Lg, we shall refer in this section to the former as the Ci -Fourier 
Transform and to the latter as the Lg-Fourier Transform. Both will be denoted 
by a "hat." In subsequent sections the Fourier Transform will be understood to be 
the Ci -Fourier Transform unless explicitly otherwise specified. 

Some readers may have already encountered the Lg-Fourier Transform without 
even being aware of it. For example, the sinc(-) function, which is defined in (5.20), 
is an energy- limited signal that is not integrable. Consequently, its Ci -Fourier 
Transform is undefined. Nevertheless, you may have seen its Fourier Transform 
being given as the Brickwall function. As we shall see, this is somewhat in line 
with how the L 2 -Fourier Transform of the sinc(-) is defined. 3 For more on the 
Fourier Transform of the sinc(-) see Section 6.2.5. Another example of an energy- 
limited signal that is not integrable is t 1— > 1/(1 + |i|). 

We next sketch how the L 2 -Fourier Transform is defined and explore some of its 
key properties. We begin with the bad news. 

(i) There is no explicit simple expression for the Lg-Fourier Transform. 

(ii) The result of applying the transform is not a function but an equivalence 
class of functions. 

The Lg-Fourier Transform is a mapping 

: L2 — ► L2 

that maps elements of Lg to elements of Lg. It thus maps equivalence classes 
to equivalence classes, not functions. As long as the operation we perform on 
the result of the Lg-Fourier Transform does not depend on which member of the 
equivalence class it is performed on, there is no need to worry about this issue. 
Otherwise, we can end up performing operations that are ill-defined. For example, 
an operation that is ill-defined is evaluating the result of the transform at a given 
frequency, say at / = 17. 

An operation you cannot go wrong with is integration, because the integrals of 
two functions that differ on a set of measure zero are equal; see Proposition 2.5.3. 
Consequently, inner products, which are defined via integration, are fine too. In 



3 However, as we shall see, the result of the Lg-Fourier Transform is an element of Lg, i.e., an 
equivalence class, and not a function. 
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this book we shall therefore refrain from applying to the result of the L^-Fourier 
Transform any operation other than integration (or related operations such as the 
computation of energy or inner product). In fact, since we find the notion of 
equivalence classes somewhat abstract we shall try to minimize its use. 

Suppose that x € C2 is an energy-limited signal and that [x] € L2 is its equivalence 
class. How do we define the /^-Fourier Transform of [x]? We first define for every 
positive integer n the time-truncated function 

x n :tH x{t)l{\t\ < n} 

and note that, by Proposition 3.4.3, x„ is integrable. Consequently, its Ci -Fourier 
Transform x„ is well-defined and is given by 

\2-Kft 



x n (f)= / x(t) e-"*'* dt, /el. 

J — n 

We then note that ||x — x n || 2 tends to zero as n tends to infinity, so for every e > 
there exists some L(e) sufficiently large so that 

||x„ - x m || 2 < e, n,m>L(e). (6.13) 

Applying Proposition 6.2.6 (ii) with the substitution of max{n, m} for T and of 
x„ — x m for both xi and X2, we obtain that (6.13) implies 

||x„ - x m || 2 < e, n,m>L(e). (6-14) 

Because the space of energy-limited signals is complete in the sense of Theo- 
rem 8.5.1 ahead, we may infer from (6.14) that there exists some function £ 6 £g 
such that ||x„ — £|| 2 converges to zero. 4 We then define the L^-Fourier Transform 
of the equivalence class [x] to be the equivalence class [(,]. In view of Footnote 4 
we can define the L 2 -Fourier Transform as follows. 

Definition 6.2.7 (Lg-Fourier Transform). The L2-Fourier Transform of the 

equivalence class [x] G L2 is denoted by [x] and is given by 



g € C 2 : lim 



n — >oc 



g(f)- / x(t) e-^V dt 



d/ = 



The main properties of the Lg-Fourier Transform are summarized in the following 
theorem. 

Theorem 6.2.8 (Properties of the L 2-Fourier Transform). The L2~Fourier Trans- 
form is a mapping from L2 onto L2 with the following properties: 

(i) 7/x g £gn£i, then the L2 -Fourier Transform of [x] is the equivalence class 
of the mapping 

/.-> / x{t) e-' 271 ^ dt. 



4 The function £ is not unique. If ||x n — £|| g — > 0, then also ||x n — C|L ~~ * whenever £ S [£]. 
And conversely, if ||x n — £\\ s — > and ||x n — C|L ~~ * 0, then ^ must be in [<^]. 
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(ii) The L 2 -Fourier Transform is linear in the sense that 

a[xi] +/3[x 2 ] = a[xi] +/3[x 2 ], [x 1 ,x 2 € C 2 , a,/3eC 



(Hi) The L 2 -Fourier Transform is invertible in the sense that to each [g] G L 2 
there corresponds a unique equivalence class in L 2 whose L 2 -Fourier Trans- 
form is [g]. This equivalence class can be obtained by reflecting each of the 
elements of [g] to obtain the equivalence class [g] of g, and by then applying 
the L 2 -Fourier Transform to it. The result [g] then satisfies 



[g] = [g], ge-C 



2- 



(6.15) 



(iv) Applying the L 2 -Fourier Transform twice is equivalent to reflecting the ele- 
ments of the equivalence class 



[x] = [x], x e C 2 . 

(v) The L 2 -Fourier Transform preserves energies? 



M 



[x] 



xg£| 



(vi) The L 2 -Fourier Transform preserves inner products: 6 



(6.16) 



(6.17) 



(6.18) 



Proof. This theorem is a restatement of (Rudin, 1974, Chapter 9, Theorem 9.13). 
Identity (6.16) appears in this form in (Stein and Weiss, 1990, Chapter 1, Section 2, 
Theorem 2.4). □ 

The result that the L 2 -Fourier Transform preserves energies is sometimes called 
Plancherel's Theorem and the result that it preserves inner products Parseval's 
Theorem. We shall use "Parseval's Theorem" for both. It is so important that 
we repeat it here in the form of a theorem. Following mathematical practice, we 
drop the square brackets in the theorem's statement. 

Theorem 6.2.9 (Parseval's Theorem). For any x,y e L 2 



(6.19) 



and 




(6.20) 



5 The energy of an equivalence class was defined in Section 4.7. 

6 The inner product between equivalence classes was defined in Section 4.7. 
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As we mentioned earlier, there is no simple explicit expression for the L^-Fourier 
Transform. The following proposition simplifies its calculation under certain as- 
sumptions that are, for example, satisfied by the sinc(-) function. 

Proposition 6.2.10. I/x = g for some ge£ ( n C 2 , then: 
(i) x e C 2 ■ 

(*) l|x|| 2 = ||g|| 2 . 
(Hi) The L z -Fourier Transform of [x] is the equivalence class [g]. 

Proof. It suffices to prove Part (iii) because Parts (i) and (ii) will then follow from 
the preservation of energy under the Lg-Fourier Transform (Theorem 6.2.8 (v)). 
To prove Part (iii) we compute 



where the first equality follows from (6.15); the second from Theorem 6.2.8 (i) 
(because the hypothesis g € £j fl C 2 implies that g € £j fl £g); and the final 
equality from Proposition 6.2.3 (i) and from the hypothesis that x = g. □ 



6.2.4 More on the Fourier Transform 

In this section we present additional results that shed some light on the problem of 
reconstructing a signal from its FT. The first is a continuity result, which may seem 
technical but which has some useful consequences. It can be used to show that the 
IFT (of an integrable function) always yields a continuous signal. Consequently, 
if one starts with a discontinuous function, takes its FT, and then the IFT, one 
does not obtain the original function. It can also be used — once we define the 
frequency response of a filter in Section 6.3 — to show that no stable filter can have 
a discontinuous frequency response. 

Theorem 6.2.11 (Continuity and Boundedness of the Fourier Transform). 

(i) //x is integrable, then its FTx is a uniformly continuous function satisfying 

/oo 
\x(t)\dt, /el, (6.21) 

-oo 

and 

lim x(f) = 0. (6.22) 

l/|->°o 
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(ii) Ifg is integrable, then its IFT g is a uniformly continuous function satisfying 

\g(t)\ < f \g(f)\df, teR. (6.23) 



lim 

A— »oo 



'■^ " " '-"■" •" dt = 0. (6.25) 



Proof. We begin with Part (i). Inequality (6.21) follows directly from the definition 
of the FT and from Proposition 2.4.1. The proof of the uniform continuity of x is 
not very difficult but is omitted. See (Katznelson, 1976, Section VI. 1, Theorem 1.2). 
A proof of (6.22) can be found in (Katznelson, 1976, Section VI. 1, Theorem 1.7). 

Part (ii) follows by substituting g for x in Part (i) because the IFT of g is the FT 
of its mirror image (6.5). □ 

The second result we present is that every integrable signal can be reconstructed 
from its FT, but not necessarily via the IFT. The reconstruction formula in (6.25) 
ahead works even when the IFT does not do the job. 

Theorem 6.2.12 (Reconstructing a Signal from Its Fourier Transform). 

(i) If two integrable signals have the same FT, then they are indistinguishable: 
(x 1 {f) = x 2 {f), /eKJ => (xi=x 2 ), xi,x 2 e£i. (6.24) 

(ii) Every integrable function x can be reconstructed from its FT in the sense that 

x(t)-J (l-^)x(/) e i2 ^*d/ 

Proof. See (Katznelson, 1976, Section VI.1.10). □ 

Conditions under which the IFT of the FT of a signal recovers the signal are given 
in the following theorem. 

Theorem 6.2.13 (The Inversion Theorem). 

(i) Suppose that x is integrable and that its FT x is also integrable. Define 

x = x. (6.26) 
Then x is a continuous function with 

lim x(t) = 0, (6.27) 

|t|-»oo 

and the functions x and x agree except on a set of Lebesgue measure zero, 
(ii) Suppose that g is integrable and that its IFT g is also integrable. Define 

g = g. (6.28) 

Then g is a continuous function with 

lim g(f) = (6.29) 

I/I-K50 

and the functions g and g agree except on a set of lebesgue measure zero. 
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Proof. For a proof of Part (i) see (Rudin, 1974, Theorem 9.11). Part (ii) follows 
by substituting g for x in Part (i) and using Proposition 6.2.3 (iii). □ 

Corollary 6.2.14. 

(i) If x is a continuous integrable signal whose FT is integrable, then 

x = x. (6.30) 

(ii) If g is continuous and integrable, and if g is also integrable, then 

g = g. (6.31) 

Proof. Part (i) follows from Theorem 6.2.13 (i) by noting that if two continuous 
functions are equal outside a set of Lebesgue measure zero, then they are identical. 
Part (ii) follows similarly from Theorem 6.2.13 (ii). □ 

6.2.5 On the Brickwall and the sinc(-) Functions 

We next discuss the FT and the IFT of the Brickwall function 

£~I{|£|<1}, (6.32) 

which derives its name from the shape of its plot. Since it is a symmetric function, 
it follows from (6.6) that its FT and IFT are identical. Both are equal to a properly 
stretched and scaled sinc(-) function (5.20). 

More generally, we offer the reader advice on how to remember that for a, 7 > 0, 

t i-» S sinc(ai) is the IFT of / i-> /?I{|/| < 7} (6.33) 

if, and only if, 

S = 2 7 /3 (6.34a) 

and 

7- = \. (6.34b) 

Condition (6.34a) is easily remembered because its LHS is the value at t = of 
<5 sinc(at) and its RHS is the value at t = of the IFT of / i-» f3l{\f\ < 7}: 



/3!{|/|<7}e i2T/t d/ 



/3I{|/|<7}d/ = 2 T /3. 
t=o 



Condition (6.34b) is intimately related to the Sampling Theorem that you may 
have already seen and that we shall discuss in Chapter 8. Indeed, in the Sam- 
pling Theorem (Theorem 8.4.3) the time between consecutive samples T and the 
bandwidth W satisfy 

TW= -. 
2 

(In this application a corresponds to 1/Tand 7 corresponds to the bandwidth W.) 
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cutoff 7 



Figure 6.1: The stretched & scaled sinc(-) function and the stretched & scaled 
Brickwall function above are an L2 Fourier pair if the value of the former at zero 
(i.e., 5) is the integral of the latter (i.e., 2 x j3 x cutoff) and if the product of the 
location of the first zero of the former by the cutoff of the latter is 1/2. 



It is tempting to say that Conditions (6.34) also imply that the FT of the func- 
tion t 1— > 5smc(at) is the function / 1— * /3I{|/| < 7}, but there is a caveat. The 
signal t i— > 5svac{ai) is not integrable. Consequently, its Ci -Fourier Transform 
(Definition 6.2.1) is undefined. However, since it is energy-limited, its Lg-Fourier 
Transform is defined (Definition 6.2.7). Using Proposition 6.2.10 with the substitu- 
tion of / I— > /3I{|/| < 7} for g, we obtain that, indeed, Conditions (6.34) imply that 
the L^-Fourier Transform of the (equivalence class of the) function 1 1— ► 8 sinc(crf) 
is the (equivalence class of the) function / 1— > /3I{|/| < 7}. 

The relation between the sinc(-) and the Brickwall functions is summarized in 
Figure 6.1. 

The derivation of the result is straightforward: the IFT of the Brickwall function 
can be computed as 



/3I{|/|<7}e'^ /t d/ = / 3 / e'^df 

1 J — 'j 



jl-Kft 



J27T7* _ -i27T7*\ 



1_ 

\2lTt 



\2irt 
2/?7sinc(27t). 



■ sin(27T7i) 



(6.35) 
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6.3 The Frequency Response of a Filter 

Recall that in Section 5.7 we defined a filter of impulse response h to be a physical 
device that when fed the input x produces the output x*h. Of course, this is only 
meaningful if the convolution is defined. Subject to some technical assumptions 
that are made precise in Theorem 6.3.2, the FT of the output waveform x*h is the 
product of the FT of the input waveform x by the FT of the impulse response h. 
Consequently, we can think of a filter of impulse response h as a physical device 
that produces an output signal whose FT is the product of the FT of the input 
signal and the FT of the impulse response. 

The FT of the impulse response is called the frequency response of the filter. If 

the filter is stable and its impulse response therefore integrable, then we define the 
filter's frequency response as the Fourier Transform of the impulse response using 
Definition 6.2.1 (the £j-Fourier Transform). If the impulse response is energy- 
limited but not integrable, then we define the frequency response as the Fourier 
Transform of the impulse response using the definition of the Fourier Transform for 
energy-limited signals that are not integrable as in Section 6.2.3 (the L^-Fourier 
Transform) . 

Definition 6.3.1 (Frequency Response). 

(i) The frequency response of a stable filter is the Fourier Transform of its 
impulse response as defined in Definition 6.2.1. 

(ii) The frequency response of an unstable filter whose impulse response is 
energy-limited is the L2-Fourier Transform of its impulse response as defined 
in Section 6.2.3. 

As discussed in Section 5.5, if x, h are both integrable, then x • h is defined at 
all epochs t outside a set of Lebesgue measure zero, and x * h is integrable. In 
this case the FT of x * h is the mapping / i— » x(f) h(f). If x is integrable and 
h is of finite energy, then x • h is also defined at all epochs t outside a set of 
Lebesgue measure zero. But in this case the convolution is only guaranteed to be 
of finite energy; it need not be integrable. We can discuss its Fourier Transform 
using the definition of the L^-Fourier Transform for energy-limited signals that are 
not integrable as in Section 6.2.3. In this case, again, the Lg-Fourier Transform of 
x * h is the (equivalence class of the) mapping / i— > x(f) h(f): 7 

Theorem 6.3.2 (The Fourier Transform of a Convolution). 

(i) If the signals h andx are both integrable, then the convolution x*h is defined 
for all t outside a set of Lebesgue measure zero; it is integrable; and its 
Ci -Fourier Transform x*h is given by 



(6.36) 




7 To be precise we should say that the Lg-Fourier Transform of x*h is the equivalence class of 
the product of the Ci -Fourier Transform of x by any element in the equivalence class consisting 
of the Lg-Fourier Transform of [h]. 
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Figure 6.2: The frequency response of the ideal unit-gain lowpass filter of cutoff 
frequency W c . Notice that W c is the length of the interval of positive frequencies 
where the gain is one. 



where x and h are the Ci -Fourier Transforms of x and h. 

(ii) If the signal x is integrable and if h is of finite energy, then the convolution 
x * h is defined for all t outside a set of Lebesgue measure zero; it is energy- 
limited; and its Li2-Fourier Transform x * h is also given by (6.36) with x, 
as before, being the Li -Fourier Transform of x but with h now being the 
L 2 -Fourier Transform ofh. 

Proof. For a proof of Part (i) see, for example, (Stein and Weiss, 1990, Chapter 1, 
Section 1, Theorem 1.4). For Part (ii) see (Stein and Weiss, 1990, Chapter 1, 
Section 2, Theorem 2.6). □ 



As an example, recall from Section 5.9 that the unit-gain ideal lowpass filter of 
cutoff frequency W c is a filter of impulse response 



h(t) = 2W C sinc(2W c t), t e 



(6.37) 



This filter is not causal and not stable, but its impulse response is energy-limited. 
The filter's frequency response is the Lg-Fourier Transform of the impulse response 
(6.37), which, using the results from Section 6.2.5, is given by (the equivalence class 
of) the mapping 

/ _ i{|/| < w c }, /eR. (6.38) 

This mapping maps all frequencies / satisfying |/| > W c to and all frequencies 
satisfying |/| < W c to one. It is for this reason that we use the adjective "unit- gain" 
in describing this filter. We denote the mapping in (6.38) by LPFw c (') so 



LPF Wc (/)=I{|/|<W c }, fe 



(6.39) 



This mapping is depicted in Figure 6.2. Note that W c is the length of the interval 
of positive frequencies where the response is one. 

Turning to the ideal unit-gain bandpass filter of bandwidth W around the carrier 
frequency f c satisfying f c > W/2, we note that, by (5.21), its time-t impulse 
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Figure 6.3: The frequency response of the ideal unit-gain bandpass filter of band- 
width W around the carrier frequency f c . Notice that, as for the lowpass filter, W 
is the length of the interval of positive frequencies where the gain is one. 



response BPFw,/ c (t) is given by 

BPF Wi/c (£) = 2Wcos(27r/ c t) sinc(Wt) 
= 2Re(LPF w/2 (£)e 



,\2nf c t 



(6.40) 



This filter too is noncausal and nonstable. From (6.40) and (6.39) we obtain using 
Table 6.1 that its frequency response is (the equivalence class of) the mapping 



r, Wi 

/^l{||/|-/c|<-i 



2 V 



We denote this mapping by BPFvv./ c ( - ) so 

BPFw,/o(/)=l{|l/|-/c|<y}, / 



(6.41) 



This mapping is depicted in Figure 6.3. Note that, as for the lowpass filter, W is 
the length of the interval of positive frequencies where the response is one. 



6.4 Bandlimited Signals and Lowpass Filtering 

In this section we define bandlimited signals and discuss lowpass filtering. We 
treat energy-limited signals and integrable signals separately. As we shall see, any 
integrable signal that is bandlimited to W Hz is also an energy-limited signal that 
is bandlimited to W Hz (Note 6.4.12). 



6.4.1 Energy- Limited Signals 

The main result of this section is that the following three statements are equivalent: 

(a) The signal x is an energy-limited signal satisfying 

(x*LPF w )(i) =x{t), teR. (6.42) 
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(b) The signal x can be expressed in the form 

,-w 

x(t) = / 5 (/) e i2l/ 'd/, tei, (6.43a) 

J-W 

for some measurable function g: / i— > <?(/) satisfying 

,-w 

| 5 (/)| 2 d/<oo. (6.43b) 

-w 

(c) The signal x is a continuous energy-limited signal whose L^-Fourier Trans- 
form x satisfies 

,-w 

\x{f)\ 2 df = / \x(f)\ 2 df. (6.44) 

We can thus define x to be an energy-limited signal that is bandlimited to W Hz 
if one (and hence all) of the above conditions hold. 

In deriving this result we shall take (a) as the definition. We shall then establish 
the equivalence (a) <=> (b) in Proposition 6.4.5, which also establishes that the 
function g in (6.43a) can be taken as any element in the equivalence class of the 
Lg-Fourier Transform of x, and that the LHS of (6.43b) is then ||x|| 2 . Finally, we 
shall establish the equivalence (a) <=> (c) in Proposition 6.4.6. 

We conclude the section with a summary of the key properties of the result of 
passing an energy-limited signal through an ideal unit-gain lowpass filter. 

We begin by defining an energy-limited signal to be bandlimited to W Hz if it is 
unaltered when it is lowpass filtered by an ideal unit-gain lowpass filter of cutoff 
frequency W. Recalling that we are denoting by LPFvv(i) the time-t impulse 
response of an ideal unit-gain lowpass filter of cutoff frequency W (see (5.19)), we 
have the following definition. 8 

Definition 6.4.1 (Energy-Limited Bandlimited Signals). We say that the signal*, 
is an energy-limited signal that is bandlimited to W Hz ifx is in C2 and 

(x*LPF w )(t) = x(t), iet. (6.45) 

Note 6.4.2. If an energy-limited signal that is bandlimited to W Hz is of zero 
energy, then it is the all-zero signal 0. 

Proof. Let x be an energy-limited signal that is bandlimited to W Hz and that 
has zero energy. Then 

|a;(t)| = |(x*LPFw)(t)| 

< ||x|| g HLPFwIU 

= \\x\\ 2 Vzw 
= 0, iet, 



8 Even though the ideal unit-gain lowpass filter of cutoff frequency W is not stable, its impulse 
response LPFw(-) is of finite energy (because it decays like 1/t and the integral of 1/t 2 from one 
to infinity is finite). Consequently, we can use the Cauchy-Schwarz Inequality to prove that if 
x £ C2 then the mapping r h^ x(r)LPFw(i — t) is integrable for every time instant (S R. 
Consequently, the convolution x*LPFw is defined at every time instant t; see Section 5.5. 
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where the first equality follows because x is an energy-limited signal that is band- 
limited to W Hz and is thus unaltered when it is lowpass filtered; the subsequent 
inequality follows from (5.6b); the subsequent equality by computing ||LPFw||g 
using Parseval's Theorem and the explicit form of the frequency response of the 
ideal unit-gain lowpass filter of bandwidth W (6.38); and where the final equality 
follows from the hypothesis that x is of zero energy. □ 



Having defined what it means for an energy-limited signal to be bandlimited to W 
Hz, we can now define its bandwidth. 9 

Definition 6.4.3 (Bandwidth). The bandwidth of an energy-limited signal x is 
the smallest frequency W to which x is bandlimited. 

The next lemma shows that the result of passing an energy-limited signal through 
an ideal unit-gain lowpass filter of cutoff frequency W is an energy-limited signal 
that is bandlimited to W Hz. 

Lemma 6.4.4. 

(i) Let y = x * LPFw be the output of an ideal unit-gain lowpass filter of cutoff 
frequency W that is fed the energy-limited input x G £2. Then y € £g; 

y(t) = / x(f) e'^f 1 d/, t e R; (6.46) 

J-W 

and the L 2 -Fourier Transform of y is the (equivalence class of the) mapping 

f^x(f)I{\f\<W}. (6.47) 

(ii) If g: / 1— » g(f) is a bounded integrable function and if x is energy-limited, 
then x *g is in C2; it can be expressed as 



(x*g)(i)=/ £(/)<?(/) e i2 ^<d/, ief; (6.48) 



and its L 2 -Fourier Transform is given by (the equivalence class of) the map- 
ping /i-> x{f)g{f). 



Proof. Even though Part (i) is a special case of Part (ii) corresponding to g being 
the mapping / i— » I{|/| < W}, we shall prove the two parts separately. We begin 
with a proof of Part (i). The idea of the proof is to express for each ( 6 R the 
time-t output y(t) as an inner product and to then use Parseval's Theorem. Thus, 



9 To be more rigorous we should use in this definition the term "infimum" instead of "smallest, 1 
but it turns out that the infimum here is also a minimum. 
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.46) follows from the calculation 




y(t) 


= (x*LPF w )(£) 






= / x(t) LPF w {t - 


- t) dr 




= (x,th LPF W (t-7 


■)) 




= (x, r h-> LPF w (r - i 


0) 




= (x, / h-» e-'^f 1 LPF 


w(/)) 




= (x,/^e- i2 ^*I{|/| 


<W}) 




/■W 






= / x(f)e^^df, 





where the fourth equality follows from the symmetry of the function LPFw(-), and 
where the fifth equality follows from Parseval's Theorem and the fact that delaying 
a function multiplies its FT by a complex exponential. Having established (6.46), 
Part (i) now follows from Proposition 6.2.10, because, by Parseval's Theorem, the 
mapping / i— » x(f) I{|/| < W} is of finite energy and hence, by Proposition 3.4.3, 
also integrable. 

We next turn to Part (ii). We first note that the assumption that g is bounded 
and integrable implies that it is also energy-limited, because if |<?(/)| < (Jqo for all 
/ e R, then \g(f)\ 2 < ffoo| 5 (/)| and / \g(f)\ 2 df < a^ J \g(f)\df. Thus, 

gednc 2 . (6.49) 

We next prove (6.48). To that end we express the convolution x*g at time t as 
an inner product and then use Parseval's Theorem to obtain 



/oo 
x(T)g(t-T)dT 
-OC 



= (x,th g*(t-T)) 
= <x,/^e- i27r V(/)) 

/OO 
i(/)ff(/)e i2w/t d/, teR, (6.50) 

-oo 

where the third equality follows from Parseval's Theorem and by noting that the 
L^-Fourier Transform of the mapping r i— > g*(t — t) is the equivalence class of 
the mapping / i— > e^' 2 '*? 1 <?*(/), as can be verified by expressing the mapping 
t i— > g*(t — t) as the IFT of the mapping / \— » e~' 27T ^ t g*(f) 

g*{t-r) = (/_%(/) e i2 **(*- T > d/) 

^(/) e ^/(r-t) d/ 

f 5 *(/) e " i2 -/*) e i2 ^d/, t,T6l, 
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and by then applying Proposition 6.2.10 to the mapping / i— > g*{f) e~ l27r ^*, which 
is in C ! C\C 2 by (6.49). 

Having established (6.48) we next examine the integrand in (6.48) and note that 
if | <?(/)| is upper-bounded by Coo, then the modulus of the integrand is upper- 
bounded by cr (x) |x(/)|, so the assumption that x € C 2 (and hence that x is of finite 
energy) guarantees that the integrand is square integrable. Also, by the Cauchy- 
Schwarz Inequality, the square integrability of g and of x implies that the integrand 
is integrable. Thus, the integrand is both square integrable and integrable so, by 
Proposition 6.2.10, the signal x * g is square integrable and its Fourier Transform 
is the (equivalence class of the) mapping / t— » x(f) <?(/)• □ 

With the aid of the above lemma we can now give an equivalent definition for 
energy-limited signals that are bandlimited to W Hz. This definition is popular 
among mathematicians, because it does not involve the L 2 'FourieT Transform and 
because the continuity of the signal is implied. 

Proposition 6.4.5 (On the Definition of Bandlimited Functions in C 2 ). 

(i) If x is an energy-limited signal that is bandlimited to W Hz, then it can be 
expressed in the form, 

x(t)= / <?(/) ^ St df, t€K, (6.51) 

J-w 



where g(-) satisfies 



w 

|<?(/)| 2 d/<oo (6.52) 

-w 



and can be taken as (any function in the equivalence class of) x. 

(ii) If a signal x can be expressed as in (6.51) for some function g(-) satisfying 
(6.52), then x is an energy-limited signal that is bandlimited to W Hz and x 
is (the equivalence class of) the mapping f i— » <?(/)I{|/| < W}. 

Proof. We first prove Part (i). Let x be an energy-limited signal that is band- 
limited to W Hz. Then 

x(t) = (x*LPF w )(i) 

x{f) e i2 ^' df, t€R, 
w 

where the first equality follows from Definition 6.4.1, and where the second equality 
follows from Lemma 6.4.4 (i). Consequently, if we pick g as (any element of the 
equivalence class of) / i— > x(/)I{|/| < W}, then (6.51) will be satisfied and (6.52) 
will follow from Parseval's Theorem. 

To prove Part (ii) define g: / t— > g(f) I{|/| < W}. From the assumption (6.52) and 
from Proposition 3.4.3 it then follows that g G Ci C\C 2 - This and (6.51) imply that 
x € C 2 and that the L^-Fourier Transform of (the equivalence class of) x is (the 
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equivalence class of) g; see Proposition 6.2.10. To complete the proof of Part (ii) 
it thus remains to show that x * LPFyv = x. This follows from the calculation: 

(x*LPF w )(i)= / i(/)e i2 ^d/ 

J-W 

w 

g(f) e' 2 ^ d/ 
•w 

= x(i), t€l, 

where the first equality follows from Lemma 6.4.4 (i); the second because we have 
already established that the Lg -Fourier Transform of (the equivalence class of) x is 
(the equivalence class of) / i— > g(f) I{|/| < W}; and where the last equality follows 
from (6.51). □ 

In the engineering literature a function is often defined as bandlimited to W Hz 
if its FT is zero for frequencies / outside the interval [— W, W]. This definition 
is imprecise because the Lg-Fourier Transform of a signal is an equivalence class 
and its value at a given frequency is technically undefined. It would be better to 
define an energy-limited signal as bandlimited to W Hz if ||x|| 2 = J_ w \x(f)\ d/ 
so "all its energy is contained in the frequency band [— W, W]." However, this is 
not quite equivalent to our definition. For example, the Lg-Fourier Transform of 
the discontinuous signal 



17 if i = 0, 

x(t) = { 

I sine 2 Wt otherwise, 

is (the equivalence class of) the Brickwall (frequency domain) function 

2^i{|/l<W}. /e» 

(because the discontinuity at t = does not influence the Fourier integral), but 
the signal is altered by the lowpass filter, which smooths it out to produce the 
continuous waveform t i— » sinc(2Wt). Readers who have already seen the Sampling 
Theorem will note that the above signal x(-) provides a counterexample to the 
Sampling Theorem as it is often imprecisely stated. 

The following proposition clarifies the relationship between this definition and ours. 
Proposition 6.4.6 (More on the Definition of Bandlimited Functions in £g). 

(i) If x is an energy-limited signal that is bandlimited to W Hz, then x is a 
continuous function and all its energy is contained in the frequency interval 
[—W^W] in the sense that its L 2 -Fourier Transform x satisfies 



(6.53) 



CO 


/•W 


W)| 2 d/ = 


/ |£(/)| 2 d/. 


-co 


J-w 
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(ii) If the signal x G C2 satisfies (6.53), then x is indistinguishable from, the 
signal x * LPF-W; which is an energy-limited signal that is bandlimited to W 
Hz. // in addition to satisfying (6.53) the signal x is continuous, then x is 
an energy-limited signal that is bandlimited to W Hz. 



Proof. This proposition's claims are a subset of those of Proposition 6.4.7, which 
summarizes some of the results relating to lowpass filtering. The proof is therefore 
omitted. □ 

Proposition 6.4.7. Let y = x*LPFw be the result of feeding the signal x € C2 to 
an ideal unit-gain lowpass filter of cutoff frequency W. Then: 

(i) y is energy-limited with 

l|y|| 2 <l|x|| 2 . (6.54) 

(ii) y is an energy -limited signal that is bandlimited to W Hz. 

(Hi) Its L 2 -Fourier Transform y is given by (the equivalence class of) the mapping 
f^x(f)I{\f\<W}. 

(iv) All the energy in y is concentrated in the frequency band [—W,W] in the 

sense that: 

,-w 

|y(/)| 2 d/=/ |y(/)| 2 d/. 
> J-w 



(v) y can be represented as 

/oo 
y(/)e i2 -/*d/, tet (6.55) 

-00 
f -w 

x{f) e a * st d/, t e R. (6.56) 

-w 

(vi) y is uniformly continuous. 

(vii) If x e C2 has all its energy concentrated in the frequency band [—W^W] in 

the sense that 

00 r w 

2j/_ / I -/ f \\2 



\x(f)\'df= \x(f)\'df, (6.57) 

> J-w 

then x is indistinguishable from the bandlimited signal x * LPFyv- 

(viii) x is an energy-limited signal that is bandlimited to W if and only if, it 
satisfies all three of the following conditions: it is in L2; it is continuous; 
and it satisfies (6.57). 
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Proof. Part (i) follows from Lemma 6.4.4 (i), which demonstrates that y is (the 
equivalence class of) the mapping / i— > x(f) I{|/| < W} so, by Parseval's Theorem, 

/>oo 

l|yf 2 = / |y(/)l 2 d/ 



w 

|£(/)| 2 d/ 

f-W 

/oo 
Kf)| 2 d/ 
-oo 

= l|xf 2 . 
Part (ii) follows because, by Lemma 6.4.4 (i), the signal y satisfies 

,-w 

,\2nft 



y(t)= x(f)e'^df 

J-w 

where 

|x(/)| 2 d/< / |.x(/)| 2 d/=||x|| 2 2 <oo, 

-W J-oo 

so, by Proposition 6.4.5, y is an energy-limited signal that is bandlimited to WHz. 

Part (iii) follows directly from Lemma 6.4.4 (i). Part (iv) follows from Part (iii). 
Part (v) follows, again, directly from Lemma 6.4.4. 

Part (vi) follows from the representation (6.56); from the fact that the IFT of 
integrable functions is uniformly continuous (Theorem 6.2.11); and because the 
condition ||x|| 2 < oo implies, by Proposition 3.4.3, that / i— > x(/)I{|/| < W} is 
integrable. 

To prove Part (vii) we note that by Part (ii) x * LPFw is an energy- limited signal 
that is bandlimited to W Hz, and we note that (6.57) implies that x is indistin- 
guishable from x * LPFw because 



— oo 
oo 



d/ 



/OO 
£(/)-x*LPF w (/) 
-OO 

|x(/)-x(/)I{|/|<W}| 2 d/ 

|^(/)| 2 d/ 
l/l>w 

= 0, 

where the first equality follows from Parseval's Theorem; the second equality from 
Lemma 6.4.4 (i); the third equality because the integrand is zero for |/| < W; and 
the final equality from (6.57). 

To prove Part (viii) define y = x * LPFw and note that if x is an energy-limited 
signal that is bandlimited to WHz then, by Definition 6.4.1, y = x so the continuity 
of x and the fact that its energy is concentrated in the interval [— W, W] follow from 
Parts (iv) and (vi). In the other direction, if x satisfies (6.57) then by Part (vii) 
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it is indistinguishable from the signal y, which is continuous by Part (vi). If, 
additionally, x is continuous, then x must be identical to y because two continuous 
functions that are indistinguishable must be identical. □ 



6.4.2 Integrable Signals 

We next discuss what we mean when we say that x is an integrable signal that is 
bandlimited to W Hz. Also important will be Note 6.4.11, which establishes that 
if x is such a signal, then x is equal to the IFT of its FT. 

Even though the ideal unit-gain lowpass filter is unstable, its convolution with any 
integrable signal is well-defined. Denoting the cutoff frequency by W c we have: 

Proposition 6.4.8. For any x G Ci the convolution integral 

X {T)LPF Wc {t-T)dT 



is defined at every epoch t€K and is given by 

/oo pW c 

x{T)LPF Wc {t-T)dT= x{f) e i2T/ * d/, teR. (6.58) 

oo J-W c 

Moreover, x*LPFw c is an energy -limited function that is bandlimited to W c Hz. 
Its L2-Fourier Transform is (the equivalence class of) the mapping 

/~a(/)i{|/l<w c }. 

Proof. The key to the proof is to note that, although the sinc(-) function is not 
integrable, it follows from (6.35) that it can be represented as the Inverse Fourier 
Transform of an integrable function (of frequency). Consequently, the existence 
of the convolution and its representation as (6.58) follow directly from Proposi- 
tion 6.2.5 and (6.35). 

To prove the remaining assertions of the proposition we note that, since x is inte- 
grable, it follows from Theorem 6.2.11 that |a:(/)| < Hx^ and hence 

w c 

|£(/)| 2 d/<oo. (6.59) 

-w c 

The result now follows from (6.58), (6.59), and Proposition 6.4.5. □ 

With the aid of Proposition 6.4.8 we can now define bandlimited integrable signals: 

Definition 6.4.9 (Bandlimited Integrable Signals). We say that the signal x is 
an integrable signal that is bandlimited to W Hz if x is integrable and if it 
is unaltered when it is lowpass filtered by an ideal unit-gain lowpass filter of cutoff 
frequency W: 

x(t) = (x*LPF w )(i), teR. 
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Proposition 6.4.10 (Characterizing Integrable Signals that Are Bandlimited to 

W Hz). If x is an integrable signal, then each of the following statements is equiv- 
alent to the statement that x is an integrable signal that is bandlimited to W Hz: 

(a) The signal x is unaltered when it is lowpass filtered: 

i(t) = (x*LPFw)(i), ief. (6.60) 

(b) The signal x can be expressed as 

x{t) = x{f) e i2,r /* d/, fei. (6.61) 

J-W 

(c) The signal x is continuous and 

x(f) = 0, |/| > W. (6.62) 

(d) There exists an integrable function g such that 

x{t) = / 9(f) e' 2 * ft d/, t e R. (6.63) 

J-w 

Proof. Condition (a) is the condition given in Definition 6.4.9, so it only remains 
to show that the four conditions are equivalent. We proceed to do so by proving 
that (a) <S> (b); that (b) => (d); that (d) => (c); and that (c) => (b). 

That (a) O (b) follows directly from Proposition 6.4.8 and, more specifically, from 
the representation (6.58). The implication (b) => (d) is obvious because nothing 
precludes us from picking g to be the mapping / i— » x(/)I{|/| < W}, which is 
integrable because x is bounded by ||x|| 1 (Theorem 6.2.11). 

We next prove that (d) => (c). We thus assume that there exists an integrable 
function g such that (6.63) holds and proceed to prove that x is continuous and 
that (6.62) holds. To that end we first note that the integrability of g implies, 
by Theorem 6.2.11, that x (= g) is continuous. It thus remains to prove that x 
satisfies (6.62). Define go as the mapping / \— > 3(/)I{|/| < W}. By (6.63) it then 
follows that x = go . Consequently, 

x = g . (6.64) 

Employing Theorem 6.2.13 (ii) we conclude that the RHS of (6.64) is equal to g 
outside a set of Lebesgue measure zero, so (6.64) implies that x is indistinguishable 
from go- Since both x and go are continuous for |/| > W, this implies that 
x(f) = go(f) for all frequencies |/| > W. Since, by its definition, go(f) = 
whenever |/| > W we can conclude that (6.62) holds. 

Finally (c) => (b) follows directly from Theorem 6.2.13 (i). □ 

From Proposition 6.4.10 (cf. (b) and (c)) we obtain: 

Note 6.4.11. If x is an integrable signal that is bandlimited to W Hz, then it is 
equal to the IFT of its FT. 
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By Proposition 6.4.10 it also follows that if x is an integrable signal that is 
bandlimited to W Hz, then (6.61) is satisfied. Since the integrand in (6.61) is 
bounded (by Hx^) it follows that the integrand is square integrable over the in- 
terval [— W, W]. Consequently, by Proposition 6.4.5, x must be an energy-limited 
signal that is bandlimited to W Hz. We have thus proved: 

Note 6.4.12. An integrable signal that is bandlimited to W Hz is also an energy- 
limited signal that is bandlimited to W Hz. 

The reverse statement is not true: the sinc(-) is an energy- limited signal that is 
bandlimited to 1/2 Hz, but it is not integrable. 

The definition of bandwidth for integrable signals is similar to Definition 6.4.3. 10 

Definition 6.4.13 (Bandwidth). The bandwidth of an integrable signal is the 
smallest frequency W to which it is bandlimited. 



6.5 Bandlimited Signals Through Stable Filters 

In this section we discuss the result of feeding bandlimited signals to stable filters. 
We begin with energy-limited signals. In Theorem 6.3.2 we saw that the convo- 
lution of an integrable signal with an energy-limited signal is defined at all times 
outside a set of Lebesgue measure zero. The next proposition shows that if the 
energy- limited signal is bandlimited to W Hz, then the convolution is defined at 
every time, and the result is an energy-limited signal that is bandlimited to W Hz. 

Proposition 6.5.1. Let x be an energy-limited signal that is bandlimited to W Hz 
and let h be integrable. Then x*h is defined for every t € K; it is an energy-limited 
signal that is bandlimited to W Hz; and it can be represented as 

,-w 
(x*h)(i)=/ x(f) h(f) e' 2 ^ d/, teR. (6.65) 

J-W 

Proof. Since x is an energy-limited signal that is bandlimited to W Hz, it follows 
from Proposition 6.4.5 that 

,-w 
x (t) = / x{f) e i2Tr/ * d/, t e R, (6.66) 

J-w 

with the mapping / i— > x(/)I{|/| < W} being square integrable and hence, by 
Proposition 3.4.3, also integrable. Thus the convolution x • h is the convolution 
between the IFT of the integrable mapping / i— > x(f) I{|/| < W} and the integrable 
function h. By Proposition 6.2.5 we thus obtain that the convolution x*h is defined 
at every time t and has the representation (6.65). The proposition will now follow 
from (6.65) and Proposition 6.4.5 once we demonstrate that 

w 

|x(/)M/)| 2 d/<oo. 
w 



3 Again, we omit the proof that the infimum is a minimum. 
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This can be proved by upper-bounding |/i(/)| by ||h.|| ^ (Theorem 6.2.11) and by 
then using Parseval's Theorem. □ 

We next turn to integrable signals passed through stable filters. 

Proposition 6.5.2 (Integrable Bandlimited Signals through Stable Filters). /,rtx 
be an integrable signal that is bandlimited to W Hz, and let h be integrable. Then 
the convolution x * h is defined for every t € R; it is an integrable signal that is 
bandlimited to W Hz; and it can be represented as 

(x*h)(t)=/ x{f) Hi) e ' 2T/t df, teR. (6.67) 

J-w 

Proof. Since every integrable signal that is bandlimited to W Hz is also an energy- 
limited signal that is bandlimited to WHz, it follows from Proposition 6.5.1 that the 
convolution x*h is defined at every epoch and that it can be represented as (6.65). 
Alternatively, one can derive this representation from (6.61) and Proposition 6.2.5. 
It only remains to show that x * h is integrable, but this follows because the 
convolution of two integrable functions is integrable (5.9). □ 

6.6 The Bandwidth of a Product of Two Signals 

In this section we discuss the bandwidth of the product of two bandlimited signals. 
The result is a straightforward consequence of the fact that the FT of a product 
of two signals is the convolution of their FTs. We begin with the following result 
on the FT of a product of signals. 

Proposition 6.6.1 (The FT of a Product Is the Convolution of the FTs). 7/xi 

and X2 are energy-limited signals, then their product 

t i— > X\(t) X2{t) 
is an integrable function whose FT is the mapping 

/i-> (xi*x 2 )(/). 

Proof. Let xi and x 2 be energy-limited signals, and denote their product by y: 

y{t)=x 1 {t)x 2 {t), t&R. 

Since both xi and x 2 are square integrable, it follows from the Cauchy-Schwarz 
Inequality that their product y is integrable and that 

Hy|li<l|xilU|x 2 || 2 . (6.68) 

Having established that the product is integrable, we next derive its FT and show 
that 

y(/) = ( Xl *x 2 )(/), /el. (6.69) 
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This is done by expressing y(f) as an inner product between two finite-energy 
functions and by then using Parseval's Theorem: 

y(f) = f°° y(t)e^ ft dt 

Xl {t) x 2 {t) e-' 2nft dt 
(ii-» xi{t),t^ x* 2 (t)e a * ft ) 

xi{~f)x 2 {f-~f)df 
= (X!*X2)(/), /el. □ 

Proposition 6.6.2. Let xi and X2 be energy-limited signals that are bandlimited to 
Wi Hz and W 2 Hz respectively. Then their product is an energy-limited signal that 
is bandlimited to Wi + W 2 Hz. 

Proof. We will show that 

/■W1+W2 

Xl (t)x 2 (t)= 9(f)e' 2vft df, teR, (6.70) 

•/-(VVi+VVa) 

where the function <?(•) satisfies 

| 5 (/)| 2 d/<oo. (6.71) 

{W 1 +W 2 ) 

The result will then follow from Proposition 6.4.5. 

To establish (6.70) we begin by noting that since xi is of finite energy and band- 
limited to Wi Hz we have by Proposition 6.4.5 

,Wi 

Xl (t)= / &i(h)J 2 " flt *fu teR. 

J-Wi 

Similarly, 

r\V 2 

x 2 (t) = / x 2 (/ 2 )e i2 ^*d/ 2 , teR. 
J-w 2 



Consequently 



Wi f w 2 



Xl (t)x 2 (t)= I ^(/Oe'^'d/i x 2 (/ 2 )e'^ f d/ 2 

-Wi V-W 2 

r Wi /-W 2 

xi(/i)£ 2 (/ 2 )e i2 " (/l+/2)t d/ 1 d/ 2 
'-Wi J-w 2 

"°ii(/i)x 2 (/ 2 )e i2 " (/l+/2)t d/ 1 d/ 2 



oo «/ —oo 
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CO />CO 

-co J — CO 

CO 

e i2 ***(xi*x 2 )(/)d/ 

-co 

CO 

e' 2 * ft g(f)df, teR, (6.72) 

J -co 

where 



9(f) = &i(f)Mf-f)df, /£«■ (6-73) 

J — oo 

Here the second equality follows from Fubini's Theorem; 11 the third because X! 
and x 2 are bandlimited to W\ and W 2 Hz respectively; and the fourth by intro- 
ducing the variables / = /i + / 2 and / = /i- 

To establish (6.70) we now need to show that because xi and x 2 are bandlimited 
to Wi and W 2 Hz respectively, it follows that 

g(f) = 0, |/|>Wi+W 2 . (6.74) 

To prove this we note that because xi and x 2 are bandlimited to Wi Hz and W 2 
Hz respectively, we can rewrite (6.73) as 



9(f) = *i(/)l{|/|<W 1 }x 2 (/-/)l{|/-/|<W 2 }d/, /£l, (6.75) 

«/ -co 

and the product l{|/| < Wi} l{|/ — /| < W 2 } is zero for all frequencies / satisfying 

l/l > Wi + w 2 . 

Having established (6.70) using (6.72) and (6.74), we now proceed to prove (6.71) 
by showing that the integrand in (6.71) is bounded. We do so by noting that 
the integrand in (6.71) is the convolution of two square-integrable functions (xi 
and x 2 ) so by (5.6b) (with the dummy variable now being /) we have 

|s(/)l<l|xi|| 2 ||x 2 || 2 = ||xi|U|x 2 || 2 <oc, /el. □ 



6.7 Bernstein's Inequality 

Bernstein's Inequality captures the engineering intuition that the rate at which 
a bandlimited signal can change is proportional to its bandwidth. The way the 
theorem is phrased makes it clear that it is applicable both to integrable signals 
that are bandlimited to W Hz and to energy-limited signals that are bandlimited 
to W Hz. 

Theorem 6.7.1 (Bernstein's Inequality). 7/x can be written as 
x(t) = / g(f) e i2T/t d/, teR 

J-W 



11 The fact that J_-^, |£(/)| d/ is finite follows from the finiteness of /_vi l^(/)| 2 d/ (which 
follows from Parseval's Theorem) and from Proposition 3.4.3. The same argument applies to X2. 
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for some integrable function g, then 

dx(t) 



d/. 



< 4ttW sup \x{t) I , fel. (6.76) 



Proof. A proof of a slightly more general version of this theorem can be found in 
(Pinsky, 2002, Chapter 2, Section 2.3.8). □ 



6.8 Time-Limited and Bandlimited Signals 

In this section we prove that no nonzero signal can be both time-limited and 
bandlimited. We shall present two proofs. The first is based on Theorem 6.8.1, 
which establishes a connection between bandlimited signals and entire functions. 
The second is based on the Fourier Series. 

We remind the reader that a function £ : C — > C is an entire function if it is 
analytic throughout the complex plane. 

Theorem 6.8.1. //x is an energy-limited signal that is bandlimited to WHz, then 
there exists an entire function £ : C — > C that agrees with x on the real axis 

£0 + i0) =x(t), teR (6.77) 

and that satisfies 

\Z(z)\ < 7 e 2 " W|z| , zeC, (6.78) 

where 7 is some constant that can be taken as V2W ||x|| 2 . 

Proof. Let x be an energy- limited signal that is bandlimited to WHz. By Propo- 
sition 6.4.5 we can express x as 

,-w 
x(t)= / 5 (/) e i2w/t d/, teR (6.79) 

J-w 

for some square-integrable function g satisfying 

| 5 (/)| 2 d/=||x|| 2 2 . (6.80) 

w 

Consider now the function £ : C — > C defined by 

,-w 
£(z)= / g(f) e®**' d/, zeC. (6.81) 

J-w 

This function is well-defined for every z G C because in the region of integration 
the integrand can be bounded by 

\g{f)^ fz \ = \g{f)\e-^ flm(z) 

<|3(/)|e 2x|/l|Im(z)l 

<| 5 (/)|e 2 - w N, l/l < W, (6.82) 



94 The Frequency Response of Filters and Bandlimited Signals 

and the RHS of (6.82) is integrable over the interval [— W, W] by (6.80) and Propo- 
sition 3.4.3. 

By (6.79) and (6.81) it follows that £ is an extension of the function x in the sense 
of (6.77). It is but a technical matter to prove that £ is analytic. One approach is 
to prove that it is differentiable at every z G C by verifying that the swapping of 
differentiation and integration, which leads to 

df r w 

-r(*)= / 5(/)(i27r/)e i2 ^d/, zeC 
0- z J-w 

is justified. See (Rudin, 1974, Section 19.1) for a different approach. 
To prove (6.78) we compute 



r w 



g(f) e i2 ^ z d/ 
-w 
,-w 
< / |<?(/) e i2T/z | d/ 
J-w 

<e**w|*|/ b(/) | d/ 
J-w 



,-w 

<e 2 * w ^VzW\ll \g(f)\ 2 df 
-w 

= V2W||x|| 2 e 2 " w l z l, 

where the inequality in the second line follows from Proposition 2.4.1; the inequality 
in the third line from (6.82); the inequality in the fourth line from Proposition 3.4.3; 
and the final equality from (6.80). □ 



Using Theorem 6.8.1 we can now easily prove the main result of this section. 

Theorem 6.8.2. Let W and T be fixed nonnegative real numbers. 7/x is an energy- 
limited signal that is bandlimited to W Hz and that is time-limited in the sense that 
it is zero for all t £ [-T/2,T/2], then x{t) = for all t G R. 

By Note 6.4.12 this theorem also holds for integrable bandlimited signals. 



Proof. By Theorem 6.8.1 x can be extended to an entire function £. Since x has 
infinitely many zeros in a bounded interval (e.g., for all t G [T, 2T]) and since £ 
agrees with x on the real line, it follows that £ also has infinitely many zeros 
in a bounded set (e.g., whenever z G {w G C : lm(w) = 0, Re(w) G [T, 21]}). 
Consequently, £ is an entire function that has infinitely many zeros in a bounded 
subset of the complex plane and is thus the all-zero function (Rudin, 1974, Theo- 
rem 10.18). But since x and £ agree on the real line, it follows that x is also the 
all-zero function. □ 
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Another proof can be based on the Fourier Series, which is discussed in the ap- 
pendix. Starting from (6.79) we obtain that the time-r?/(2W) sample of x(-) satisfies 

1 r w i 

x(Z) = s(/)^L=e iW(2W) df, ^ez, 



/2W ^2VW 



-w 



where we recognize the RHS of the above as the 77-th Fourier Series Coefficient of 
the function / i-» g(f) l{\f\ < W} with respect to the interval [-W,W) (Note A. 3. 5 
on Page 693). But since x(i) = whenever |t| > T/2, it follows that all but a finite 
number of these samples can be nonzero, thus leading us to conclude that all but a 
finite number of the Fourier Series Coefficients of g(-) are zero. By the uniqueness 
theorem for the Fourier Series (Theorem A. 2. 3) it follows that g(-) is equal to a 
trigonometric polynomial (except possibly on a set of measure zero). Thus, 

n 

9(f) = E a v e i2m)//(2W) , / e [- W, W] \ AT, (6.83) 

'q——n 

for some n € N; for some 2n + 1 complex numbers a_ n , . . . , o„; and for some set 
AT C [— W, W] of Lebesgue measure zero. Since the integral in (6.79) is insensitive 
to the behavior of g on the set N ', it follows from (6.79) and (6.83) that 



/W n 
e i2 ^' V aT ,e i2 ^/( 2W ) 
-w 



J2aJ e i2 ^(* + ^)l{|/|<W}d/ 

n 

2W Y^ a v sinc(2W£ + rj), t eR, 



i.e., that x is a linear combination of a finite number of time-shifted sinc(-) func- 
tions. It now remains to show that no linear combination of a finite number of 
time-shifted sinc(-) functions can be zero for all t G [T, 2T] unless it is zero for 
all t £ R. This can be established by extending the sines to entire functions so 
that the linear combination of the time-shifted sinc(-) functions is also an entire 
function and by then calling again on the theorem that an entire function that has 
infinitely many zeros in a bounded subset of the complex plane must be the all-zero 
function. 



6.9 A Theorem by Paley and Wiener 

The theorem of Paley and Wiener that we discuss next is important in the study 
of bandlimited functions, but it will not be used in this book. 

Theorem 6.8.1 showed that every energy-limited signal x that is bandlimited to W 
Hz can be extended to an entire function £ satisfying (6.78) for some constant 7 
by defining £(z) as 

i{z)= / x(f)e' 2 ^ z df, zeC. (6.84) 

J-w 
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The theorem of Paley and Wiener that we present next can be viewed as the 
reverse statement. It demonstrates that if £ : C — » C is an entire function that 
satisfies (6.78) and whose restriction to the real axis is square integrable, then its 
restriction to the real axis is an energy-limited signal that is bandlimited to W Hz 
and, moreover, if we denote this restriction by x so x(t) = t;(t + iO) for all t G K, 
then £ is given by (6.84). This theorem demonstrates the close connection between 
entire functions satisfying (6.78) — functions that are called entire functions of 
exponential type — and energy- limited signals that are bandlimited to WHz. 

Theorem 6.9.1 (Paley-Wiener). If for some positive constants W and 7 the entire 
function £ : C — > C satisfies 

|£0)l <7e 27rW l z l, zeC (6.85) 

and if 

/oo 
|C(* + iO)| 2 di < 00, (6.86) 

-00 

then there exists an energy-limited function g : E — > C such that 



g(f)e' 2 * fz df, zeC. (6.87) 

-w 



Proof. See for example, (Rudin, 1974, Theorem 19.3) or (Katznelson, 1976, Chap- 
ter VI, Section 7) or (Dym and McKean, 1972, Section 3.3). □ 



6.10 Picket Fences and Poisson Summation 

Engineering textbooks often contain a useful expression for the FT of an infinite 
series of equally-spaced Dirac's Deltas. Very roughly, the result is that the FT of 
the mapping 



t^ Y, *(* + J T » 



3=-°° 



is the mapping 



.. 00 

?7— — 00 



where S(-) denotes Dirac's Delta. Needless to say, we are being extremely informal 
because we said nothing about convergence. This result is sometimes called the 
picket-fence miracle, because if we envision the plot of Dirac's Delta as an 
upward pointing bold arrow stemming from the origin, then the plot of a sum of 
shifted Delta's resembles a picket fence. The picket-fence miracle is that the FT 
of a picket fence is yet another scaled picket fence; see (Oppenheim and Willsky, 
1997, Chapter 4, Example 4.8 and also Chapter 7, Section 7.1.1.) or (Kwakernaak 
and Sivan, 1991, Chapter 7, Example 7.4.19(c)). 
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In the mathematical literature, this result is called "the Poisson summation for- 
mula." It states that under certain conditions on the function rp S Ci, 

£ <K*r.) = f £ Kr)- (6 - 88) 

j=— oo r;— — oo 

To identify the roots of (6.88) define the mapping 

oo 

4>{t)= J2 # + J T »)> (6.89) 

j=-oo 

and note that this function is periodic in the sense that 4>{t + T s ) = (j>(t) for every 
i 6 R. Consequently, it is instructive to study its Fourier Series on the interval 
[— T s /2,T s /2] (Note A. 3. 5 in the appendix). Its 77-th Fourier Series Coefficient with 
respect to the interval [— T s /2,T s /2] is given by 



<£(i) -t= e-' 2 ^t/T s dt = _L / V^(t + jj s ) e - 2 ^*/T B dt 

-T s /2 Vis Vis J-T B /2 j = _ 00 

= J=J2 V(r) e - i2m ^ T s)/ T = dr 

v T, j=-«,-'- T '/ 2 +^ 

= -L £ / V(r) e - i2 ^^dr 

V >s ._ .„ J-T„/2+iT„ 



j = -oo 



1 f 00 

— / V(t) e- i27rr ' r / Ts dr 

's J — OO 

's ^ 's ' 



where the first equality follows from the definition of </>(•) (6.89); the second by 
swapping the summation and the integration and by defining r = t + jl s ; the third 
by the periodicity of the complex exponential; the fourth because summing the 
integrals over disjoint intervals whose union is R is just the integral over R; and 
the final equality from the definition of the FT. 

We can thus interpret the RHS of (6.88) as the evaluation 12 at t = of the Fourier 
Series of </>(•) and the LHS as the evaluation of 4>{) at t = 0. Having established 
the origin of the Poisson summation formula, we can now readily state conditions 
that guarantee that it holds. An example of a set of conditions that guarantees 
(6.88) is the following: 

1) The function ip(-) is integrable. 

2) The RHS of (6.89) converges at t = 0. 

3) The Fourier Series of </>(•) converges at t = to the value of <j){-) at t = 0. 



12 At t = the complex exponentials are all equal to one, and the Fourier Series is thus just 
the sum of the Fourier Series Coefficients. 
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We draw the reader's attention to the fact that it is not enough that both sides of 
(6.88) converge absolutely and that both tp(-) and tp(-) be continuous; see (Katznel- 
son, 1976, Chapter VI, Section 1, Exercise 15). 

A setting where the above conditions are satisfied and where (6.88) thus holds is 
given in the following proposition. 

Proposition 6.10.1. Let ?/>(•) be a continuous function satisfying 

IJ_ T 4(T)dr otherwise, 
where 

T 

|£(r)| 2 dr <oo, (6.90b) 

-T 

and where T > is some constant. Then for any T s > 

OO ^ OO rt 

Y, Hit*) = j E ^(-r)- ( 6 - 90c ) 

j=— oo rj— — oo 



Proof. The integrability of ip(-) follows because tp(-) is continuous and zero outside 
a finite interval. That the RHS of (6.89) converges at t = follows because the 
fact that ip(-) is zero outside the interval [— T, +T] implies that only a finite number 
of terms contribute to the sum at t = 0. That the Fourier Series of </>(•) converges 
at t = to the value of <p(-) at t = follows from (Katznelson, 1976, Chapter 1, 
Section 6, Paragraph 6.2, Equation (6.2)) and from the corollary in (Katznelson, 
1976, Chapter 1, Section 3, Paragraph 3.1). □ 



6.11 Additional Reading 

There are a number of excellent books on Fourier Analysis. We mention here 
(Katznelson, 1976), (Dym and McKean, 1972), (Pinsky, 2002), and (Korner, 1988). 
In particular, readers who would like to better understand how the FT is defined for 
energy-limited functions that are not integrable may wish to consult (Katznelson, 
1976, Section VI 3.1) or (Dym and McKean, 1972, Sections 2.3-2.5). Numerous 
surprising applications of the FT can be found in (Korner, 1988). 

Engineers often speak of the 2WT degrees of freedom that signals that are band- 
limited and time-limited have. A good starting point for the literature on this is 
(Slepian, 1976). 

Bandlimited functions are intimately related to "entire functions of exponential 
type." For an accessible introduction to this concept see (Requicha, 1980); for a 
more mathematical approach see (Boas, 1954). 
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6.12 Exercises 

Exercise 6.1 (Symmetries of the FT). Let x: R — » C be integrable, and let x be its FT. 

(i) Show that if x is a real signal, then x is conjugate symmetric, i.e., x(—f) — £*(/), 
for every /el. 

(ii) Show that if x is purely imaginary (i.e., takes on only purely imaginary values), 
then x is conjugate antisymmetric, i.e., £(— /) = — £*(/), for every / £ R. 

(iii) Show that x can be written uniquely as the sum of a conjugate-symmetric function 
g ca and a conjugate-antisymmetric function g ca s- Express g cs & g cas in terms of x. 

Exercise 6.2 (Reconstructing a Function from Its IFT). Formulate and prove a result 
analogous to Theorem 6.2.12 for the Inverse Fourier Transform. 

Exercise 6.3 (Eigenfunctions of the FT). Show that if the energy-limited signal x satisfies 
x = Ax for some A £ C, then A can only be ±1 or ±i. (The Hermite functions are such 
signals.) 

Exercise 6.4 (Existence of a Stable Filter (1)). Let W > be given. Does there exist a 
stable filter whose frequency response is zero for |/| < W and is one for W < / < 2W? 

Exercise 6.5 (Existence of a Stable Filter (2)). Let W > be given. Does there exist a 
stable filter whose frequency response is given by cos(/) for all |/| > W? 

Exercise 6.6 (Existence of an Energy- Limited Signal). Argue that there exists an energy- 
limited signal x whose FT is (the equivalence class of) the mapping / i— » e~ ! l{f > 0}. 
What is the energy in x? What is the energy in the result of feeding x to an ideal unit-gain 
lowpass filter of cutoff frequency W c = 1? 

Exercise 6.7 (Passive Filters). Let h be the impulse response of a stable filter. Show that 
the condition that "for every x £ C2 the energy in x*h does not exceed the energy in x" 
is equivalent to the condition 

|M/)|<i, /eR. 

Exercise 6.8 (Real and Imaginary Parts of Bandlimited Signals). Show that if ai(-) is an 
integrable signal that is bandlimited to W Hz, then its real and imaginary parts are also 
integrable signals that are bandlimited to W Hz. 

Exercise 6.9 (Inner Products and Filtering). Let x be an energy-limited signal that is 
bandlimited to W Hz. Show that 

(x,y) = (x,y*LPF w ), y € C 2 - 

Exercise 6.10 (Squaring a Signal). Show that if x is an eneregy-limited signal that is 
bandlimited to W Hz, then t 1— » x 2 (t) is an integrable signal that is bandlimited to 2W 
Hz. 

Exercise 6.11 (Squared sinc(-)). Find the FT and IFT of the mapping t 1— » sinc 2 (i). 
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Exercise 6.12 (A Stable Filter). Show that the IFT of the function 

fl if |/| < a 

So'-f^y-^ ifa<|/|<6 
( otherwise 

is given by 

1 cos(27rai) — cos(27rfet) 
g0 : l ^ (Trt)^ 2(6 -a) 

and that this signal is integrable. Here b > a > 0. 

Exercise 6.13 (Multiplying Bandlimited Signals by a Carrier). Let x be an integrable 
signal that is bandlimited to W Hz. 

(i) Show that if f c > W, then 

/>00 /"OO 

/ x(i) cos(2tt f c t) dt = / x(i) sin(27r/ c t) dt = 0. 

«/ — OO «/ — CX5 

(ii) Show that if / c > W/2, then 

/OO -1 poo 

x(t) cos 2 (2n f c t) dt = - x(t)dt. 

-oo -J — OO 

Exercise 6.14 (An Identity). Prove that for every Wcl 

sinc(2Wi) cos(2ttW£) = sinc(4Wi), t G R. 
Illustrate the identity in the frequency domain. 

Exercise 6.15 (Picket Fences). If you are familiar with Dirac's Delta, explain how (6.88) is 
related to the heuristic statement that the FT of S,gz ^(i + jTa) is T" 1 ^2 €Z S(f + rj/T B ). 

Exercise 6.16 (Bounding the Derivative). Show that if x is an energy-limited signal that 
is bandlimited to W Hz, then its time-i derivative x'(t) satisfies 



c'W|</|^W 3 / 2 ||x|| 



te 



Hint: Use Proposition 6.4-5 and the Cauchy-Schwarz Inequality 

Exercise 6.17 (Another Notion of Bandwidth). Let U denote the set of all energy-limited 
signals u such that at least 90% of the energy of u is contained in the band [— W, W]. 
IsWa linear subspace of C2 ? 



Chapter 7 

Passband Signals and Their Representation 

7.1 Introduction 

The signals encountered in wireless communications are typically real passband 
signals. In this chapter we shall define such signals and define their bandwidth 
around a carrier frequency. We shall then explain how such signals can be rep- 
resented using their complex baseband representation. We shall emphasize two 
relationships: that between the energy in the passband signal and in its baseband 
representation, and that between the bandwidth of the passband signal around the 
carrier frequency and the bandwidth of its baseband representation. We ask the 
reader to pay special attention to the fact that only real passband signals have a 
baseband representation. 

Most of the chapter deals with the family of integrable passband signals. As we 
shall see in Corollary 7.2.4, an integrable passband signal must have finite energy, 
and this family is thus a subset of the family of energy- limited passband signals. 
Restricting ourselves to integrable signals — while reducing the generality of some of 
the results — simplifies the exposition because we can discuss the Fourier Transform 
without having to resort to the Lg-Fourier Transform, which requires all statements 
to be phrased in terms of equivalence classes. But most of the derived results will 
also hold for the more general family of energy- limited passband signals with only 
slight modifications. The required modifications are discussed in Section 7.7. 



7.2 Baseband and Passband Signals 

Integrable signals that are bandlimited to WHz were defined in Definition 6.4.9. By 
Proposition 6.4.10, an integrable signal x is bandlimited to WHz if it is continuous 
and if its FT is zero for all frequencies outside the band [— W, W]. The bandwidth 
of x is the smallest Wto which it is bandlimited (Definition 6.4.13). As an example, 
Figure 7.1 depicts the FT x of a real signal x, which is bandlimited to W Hz. 
Since the signal x in this example is real, its FT is conjugate-symmetric, (i.e., 
x(—f) = X*(f) for all frequencies / G K). Thus, the magnitude of x is symmetric 
(even), i.e., |£(/)| = \x(— f)\, but its phase is anti-symmetric (odd). In the figure 
dashed lines indicate this conjugate symmetry. 
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Figure 7.1: The FT x of a real bandwidth-W baseband signal x. 
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Figure 7.2: The FT y of a real passband signal y that is bandlimited to W Hz 
around the carrier frequency f c . 



Consider now the real signal y whose FT y is depicted in Figure 7.2. Again, since 
the signal is real, its FT is conjugate-symmetric, and hence the dashed lines. This 
signal (if continuous) is bandlimited to / c + W/2 Hz. But note that y(f) = for all 
frequencies / in the interval |/| < f c — W/2. Signals such as y are often encountered 
in wireless communication, because in a wireless channel the very- low frequencies 
often suffer severe attenuation and are therefore seldom used. Another reason 
is the concurrent use of the wireless spectrum by many systems. If all systems 
transmitted in the same frequency band, they would interfere with each other. 
Consequently, different systems are often assigned different carrier frequencies so 
that their transmitted signals will not overlap in frequency. This is why different 
radio stations transmit around different carrier frequencies. 



7.2.1 Definition and Characterization 

To describe signals such as y we use the following definition for passband signals. 
We ask the reader to recall the definition of the impulse response BPFw,/ c (') (see 
(5.21)) and of the frequency response BPFw,/ c ( - ) (see (6.41)) of the ideal unit-gain 
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bandpass filter of bandwidth W around the carrier frequency f c . 

Definition 7.2.1 (A Passband Signal). A signal xp B is said to be an integrable 
passband signal that is bandlimited to W Hz around the carrier fre- 
quency f c if it is integrable 

x PB e£j; (7.1a) 

the carrier frequency f c satisfies 

W 
fc > -j > 0; (7.1b) 

and if xp B is unaltered when it is fed to an ideal unit-gain bandpass filter of band- 
width W around the carrier frequency f c 

Xp B (i) = (xp B *BPF Wi/o )(f), teR. (7.1c) 

An energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f c is analogously defined but with (7.1a) replaced by the 
condition 

xp B e£ 2 . (7.1a') 

(That the convolution in (7.1c) is defined at every t G R whenever xp B is integrable 
can be shown using Proposition 6.2.5 because BPF\v./ c is the Inverse Fourier Trans- 
form of the integrable function / \— » l{ |/| — / c < W/2}. That the convolution is 
defined at every t 6 E also when xp B is of finite energy can be shown by noting 
that BPFw,/ c is of finite energy, and the convolution of two finite-energy signals is 
defined at every time (£K; see Section 5.5.) 

In analogy to Proposition 6.4.10 we have the following characterization: 

Proposition 7.2.2 (Characterizing Integrable Passband Signals). Let f c and W 

satisfy f c > W/2 > 0. If xp B is an integrable signal, then each of the following 
statements is equivalent to the statement that xp B is an integrable passband signal 
that is bandlimited to W Hz around the carrier frequency f c . 

(a) The signal xp B is unaltered when it is bandpass filtered: 

x PB (t) = (x PB *BPF Wi/o )(t), t e R. (7.2) 

(b) The signal xp B can be expressed as 

x PB (t) = f x PB (f) e i2 ^' d/, t € R. (7.3) 

J \\f\-fc\<W/2 

(c) The signal xp B is continuous and 

£pb(/)=0, ||/|-/c|>y. (7.4) 

(d) There exists an integrable function g such that 



x PB (t)= / 5 (/)e i2 ^d/, teR. (7.5) 

J |l/|-/c|<W/2 

Proof. The proof is similar to the proof of Proposition 6.4.10 and is omitted. □ 
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7.2.2 Important Properties 

By comparing (7.4) with (6.62) we obtain: 

Corollary 7.2.3 (Passband Signals Are Bandlimited). J/xpb is an integrable pass- 
band signal that is bandlimited to W Hz around the carrier frequency f c , then it is 
an integrable signal that is bandlimited to f c + W/2 Hz. 

Using Corollary 7.2.3 and Note 6.4.12 we obtain: 

Corollary 7.2.4 (Integrable Passband Signals Are of Finite Energy). Any inte- 
grable passband signal that is bandlimited to W Hz around the carrier frequency f c 
is of finite energy. 

Proposition 7.2.5 (Integrable Passband Signals through Stable Filters). I/xpb 
is an integrable passband signal that is bandlimited to W Hz around the carrier 
frequency f c , and if h G Ci is the impulse response of a stable filter, then the 
convolution xpb * h is defined at every epoch; it is an integrable passband signal 
that is bandlimited to W Hz around the carrier frequency f c ; and its FT is the 
mapping f i-» x PB (f) h{f). 

Proof. The proof is similar to the proof of the analogous result for bandlimited 
signals (Proposition 6.5.2) and is omitted. □ 

7.3 Bandwidth around a Carrier Frequency 

Definition 7.3.1 (The Bandwidth around a Carrier Frequency). The bandwidth 

around the carrier f c of an integrable or energy-limited passband signal xpb is 
the smallest W for which both (7.1b) and (7.1c) hold. 

Note 7.3.2 (The Carrier Frequency Is Critical). The bandwidth of xpb around 
the carrier frequency f c is determined not only by the FT of xpb but also by f c . 

For example, the real passband signal whose FT is depicted in Figure 7.3 is of 
bandwidth W around the carrier frequency f c , but its bandwidth is smaller around 
a slightly higher carrier frequency. 

At first it may seem that the definition of bandwidth for passband signals is incon- 
sistent with the definition for baseband signals. This, however, is not the case. A 
good way to remember the definitions is to focus on real signals. For such signals 
the bandwidth for both baseband and passband signals is defined as the length of 
an interval of positive frequencies where the FT of the signal may be nonzero. For 
baseband signals the bandwidth is the length of the smallest interval of positive 
frequencies of the form [0, W] containing all positive frequencies where the FT may 
be nonzero. For passband signals it is the length of the smallest interval of positive 
frequencies that is symmetric around the carrier frequency f c and that contains 
all positive frequencies where the signal may be nonzero. (For complex signals we 
have to allow for the fact that the zeros of the FT may not be symmetric sets 
around the origin.) See also Figures 6.2 and 6.3. 
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Figure 7.3: The FT of a complex baseband signal of bandwidth W Hz (above) 
and of a real passband signal of bandwidth W Hz around the carrier frequency f c 
(below). 



We draw the reader's attention to an important consequence of our definition of 
bandwidth: 

Proposition 7.3.3 (Multiplication by a Carrier Doubles the Bandwidth). Ifx is 

an integrable signal of bandwidth W Hz and if f c > W, then 1 1— * x(t) cos(2ir f c t) is 
an integrable passband signal of bandwidth 2W around the carrier frequency f c . 

Proof. Define y : i i — >■ x(t) cos(2tt f c t). The proposition is a straightforward conse- 
quence of the definition of the bandwidth of x (Definition 6.4.13); the definition of 
the bandwidth of y around the carrier frequency f c (Definition 7.3.1); and the fact 
that if x is a continuous integrable signal of FT x, then y is a continuous integrable 
signal of FT 

y(f) = \(Hf-fc) + Hf + fc)), /el, (7.6) 

where (7.6) follows from the calculation 



y(f) 



y(t) e-'^f 1 At 

i 

x{t) cos{2tt f c t) e -' ,2lTft dt 
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Figure 7.4: The FT of a complex baseband bandwidth-W signal x. 
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2W 



U ~ W / c / c + W 



Figure 7.5: The FT of y: 1 1— » x(t) cos (2irf c t), where x is as depicted in Figure 7.4. 
Note that x is of bandwidth W and that y is of bandwidth 2W around the carrier 
frequency f c . 



= i27T/ c t 



i27T/ c t 



ar(t)- — e- i2 ^'* dt 



,(t)e-^f-^ t &t+ l - / .,-(/). L ' ; "-'' -'•■" (1/ 



= ^{x(f-fo) + x(f + fc)), /el. 

As an illustration of the relation (7.6) note that if x is the complex bandwidth-W 
signal whose FT is depicted in Figure 7.4, then the signal y : t <—* x(t) cos(2tt f c t) is 
the complex passband signal of bandwidth 2W around f c whose FT is depicted in 
Figure 7.5. 

Similarly, if x is the real baseband signal of bandwidth W whose FT is depicted 
in Figure 7.6, then y : i i — » x(t) cos(2ir f c t) is the real passband signal of bandwidth 
2W around f c whose FT is depicted in Figure 7.7. □ 



In wireless applications the bandwidth W of the signals around the carrier frequency 
is typically much smaller than the carrier frequency / c , but for most of our results 
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Figure 7.6: The FT of a real baseband bandwidth-W signal x. 
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Figure 7.7: The FT of y : t <— » x(i) cos (2irf c t), where x is as depicted in Figure 7.6. 
Here x is of bandwidth W and y is of bandwidth 2W around the carrier frequency 

/c 



it suffices that (7.1b) hold. 

The notion of a passband signal is also applied somewhat loosely in instances where 
the signals are not bandlimited. Engineers say that an energy-limited signal is a 
passband signal around the carrier frequency f c if most of its energy is contained 
in frequencies that are close to f c and — f c . Notice that in this "definition" we are 
relying heavily on Parseval's theorem. I.e., we think about the energy ||x|| 2 of x as 
being computed in the frequency domain, i.e., by computing ||x|| 2 = J |x(/)| 2 d/. 
By "most of the energy is contained in frequencies that are close to f c and — / c " 
we thus mean that most of the contributions to this integral come from small 
frequency intervals around f c and —f c . In other words, we say that x is a passband 
signal whose energy is mostly concentrated in a bandwidth W around the carrier 
frequency f c if 



|£(/)| 2 d/: 



l/|-/c|<W/2 



m)\ 2 df. 



(7.7) 



Similarly, a signal is approximately a baseband signal that is bandlimited to W Hz 
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if 



|2 



x(/)| 2 d/« \Hf)\ 2 df. (7.8) 



-w 



7.4 Real Passband Signals 

Before discussing the baseband representation of real passband signals we empha- 
size the following. 

(i) The passband signals transmitted and received in Digital Communications 
are real. 

(ii) Only real passband signals have a baseband representation. 

(iii) The baseband representation of a real passband signal is typically a complex 
signal. 

(iv) While the FT of real signals is conjugate-symmetric (6.3), this does not imply 
any symmetry with respect to the carrier frequency. Thus, the FT depicted 
in Figure 7.2 and the one depicted in Figure 7.7 both correspond to real 
passband signals. (The former is bandlimited to W Hz around f c and the 
latter to 2W around / c .) 

We also note that if x is a real integrable signal, then its FT must be conjugate- 
symmetric. But if g G Ci is such that its IFT g is real, it does not follow that g 
must be conjugate-symmetric. For example, the conjugate symmetry could be 
broken on a set of frequencies of Lebesgue measure zero, a set that does not influ- 
ence the IFT. As the next proposition shows, this is the only way the conjugate 
symmetry can be broken. 

Proposition 7.4.1. If x is a real signal and z/x = g for some integrable function 
g: f <~> 9(f), then: 

(i) The signal x can be represented as the IFT of a conjugate- symmetric inte- 
grable function. 

(ii) The function g and the conjugate-symmetric function f \— » (g(f) + g*(—f))/2 
agree except on a set of frequencies of Lebesgue measure zero. 

Proof. Since x is real and since x = g it follows that 
x(t) = Re(x(t)) 

= ^(t) + -x*(t) 



9(f) e' 2 " ft d/ + \ (J° g(f) e^ st d/) 
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= ~ \ 9(f) J 27rft df+- g*(-f) e i2T/t d/ 

g(/)+g'(-/) ^/ td/[ t€R: 

> 2 

where the first equality follows from the hypothesis that x is a real signal; the second 
because for any z € C we have Re(z) = (z + z*)/2; the third by the hypothesis 
that x = g; the fourth because conjugating a complex integral is tantamount 
to conjugating the integrand (Proposition 2.3.1 (ii)); the fifth by changing the 
integration variable in the second integral to / = — /; and the sixth by combining 
the integrals. Thus, x is the IFT of the conjugate-symmetric function defined by 
/ h_ * (9(f) + 9*(~ /))/2, and (i) is established. 

As to (ii), since x is the IFT of both g and / 1— » (<?(/) + g*{ — /))/2, it follows from 
the IFT analog of Theorem 6.2.12 that the two agree outside a set of Lebesgue 
measure zero. □ 



7.5 The Analytic Signal 

In this section we shall define the analytic representation of a real passband 
signal. This is also sometimes called the analytic signal associated with the 
signal. We shall use the two terms interchangeably. The analytic representation 
will serve as a steppingstone to the baseband representation, which is extremely 
important in Digital Communications. We emphasize that an analytic signal can 
only be associated with a real passband signal. The analytic signal itself, however, 
is complex- valued. 

7.5.1 Definition and Characterization 

Let xpb be a real integrable passband signal that is bandlimited to W Hz around 
the carrier frequency f c . We would have liked to define its analytic representation 
as the complex signal xa whose FT is the mapping 

/->£pb(/)I{/>0}, (7.9) 

i.e., as the integrable signal whose FT is equal to zero at negative frequencies and to 
£pb(/) at nonnegative frequencies. While this is often the way we think about xa, 
there are two problems with this definition: an existence problem and a uniqueness 
problem. It is not prima facie clear that there exists an integrable signal whose FT 
is the mapping (7.9). (We shall soon see that there does.) And, since two signals 
that differ on a set of Lebesgue measure zero have identical Fourier Transforms, the 
above definition would not fully specify xa- This could be remedied by insisting 
that xa be continuous, but this would further exacerbate the existence issue. (We 
shall see that there does exist a unique integrable continuous signal whose FT is 
the mapping (7.9), but this requires proof.) Our approach is to define xa as the 
IFT of the mapping (7.9) and to then explore the properties of xa- 
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Definition 7.5.1 (Analytic Representation of a Real Passband Signal). The an- 
alytic representation of a real integrable passband signal xpe that is bandlimited 
to W Hz around the carrier frequency f c is the complex signal xa defined by 

f'OO 

x A (t)= ip B (/)e 12l/t d/, te». (7.10) 

■Jo 

Note that, by Proposition 7.2.2, ccpb(/) vanishes at frequencies / that satisfy 
I l/l — /c > VV/2, so we can also write (7.10) as 

x A (t)=f C 2 i PB (/)e i2 ^d/, ief. (7.11) 

This latter expression has the advantage that it makes it clear that the integral 
is well-defined for every ( G I, because the integrability of xpb implies that the 
integrand is bounded, i.e., that xpb(/) < II x pb|| j for every f EM. (Theorem 6.2.11) 
and hence that the mapping / i— » ips(/) I{|/ — /c| < W/2} is integrable. 

Also note that our definition of the analytic signal may be off by a factor of two 
or v2 from the one used in some textbooks. (Some textbooks introduce a factor 
of V2 in order to make the energy in the analytic signal equal that in the passband 
signal. We do not do so and hence end up with a factor of two in (7.23) ahead.) 

We next show that the analytic signal xa is a continuous and integrable signal and 
that its FT is given by the mapping (7.9). In fact, we prove more. 

Proposition 7.5.2 (Characterizations of the Analytic Signal). Let xpe be a real 
integrable passband signal that is bandlimited to W Hz around the carrier fre- 
quency f c . Then each of the following statements is equivalent to the statement 
that the complex-valued signal xa is its analytic representation. 

(a) The signal xa is given by 

f i w 
x A (t) = f ' x PB (f) e' 2vft d/, t E R. (7.12) 

(b) The signal xa is a continuous integrable signal satisfying 

f m J*pb(/) if / > 0, 

za(/) = < . (7.13) 

I otherwise. 

(c) The signal xa is an integrable passband signal that is bandlimited to W Hz 
around the carrier frequency f c and that satisfies (7.13). 

(d) The signal xa is given by 

x A = x PB *g (7.14a) 

for every integrable mapping g: / t— > g(f) satisfying 

9(f) = 1, |/-/c|<y, (7.14b) 
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and 

9(f) =0, |/ + / c |<y (7.14c) 

(with g(f) unspecified at other frequencies). 

Proof. That Condition (a) is equivalent to the statement that xa is the analytic 
representation of xpe is just a restatement of Definition 7.5.1. It thus only remains 
to show that Conditions (a), (b), (c), and (d) are equivalent. We shall do so by 
establishing that (a) <(=> (d); that (b) «=> (c); that (b) => (a); and that (d) => (c). 

To establish (a) <(=> (d) we use the integrability of xpb and of g to compute xpe *g 
using Proposition 6.2.5 as 



/oo 
xMf)9{f)e^ ft df 
-oo 

xpbU) 9(f) e^f* df 



i.i 

/c+ — 

2 xpB(/)ff(/)e i2 ^d/ 

f _ W 

Jc 2 

f + — 

= I' x PB (/)e i27r/t d/, ief, 

where the first equality follows from Proposition 6.2.5; the second because the 
assumption that xpe is a passband signal implies, by Proposition 7.2.2 (cf (c)), 
that the only negative frequencies / < where £pb(/) can be nonzero are those 
satisfying \ — f — f c \ < VV/2, and at those frequencies g is zero by (7.14c); the third 
by Proposition 7.2.2 (cf. (c)); and the fourth equality by (7.14b). This establishes 
that (a) «=> (d). 

The equivalence (b) <=> (c) is an immediate consequence of Proposition 7.2.2. That 
(b) => (a) can be proved using Corollary 6.2.14 as follows. If (b) holds, then xa 
is a continuous integrable signal whose FT is given by the integrable function on 
the RHS of (7.13) and therefore, by Corollary 6.2.14, x A is the IFT of the RHS of 
(7.13), thus establishing (a). 

We now complete the proof by showing that (d) =>• (c). To this end let g : / i— ► g(f) 
be a continuous integrable function satisfying (7.14b) & (7.14c) and additionally 
satisfying that its IFT g is integrable. For example, g could be the function from M. 
to M. that is defined by 



9(f) = { 



1 


if 1/ - /c| < W/2, 




W c -2|/-/ C | 


if 1/ - /c| > W c /2 
otherwise. 



(7.15) 



where W c can be chosen arbitrarily in the range 

W<W C <2/ C . (7.16) 
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This function is depicted in Figure 7.8. By direct calculation, it can be shown that 
its IFT is given by 1 



■ l2 wf c t 1 cos(7rWi) - cos(7rW c t) 
(nt) 2 W c -W 



m=*- ht T^ v z ;:: c; > *€R, (7.i7) 



which is integrable. Define now h = g and note that, by Corollary 6.2.14, h = g. 
If (d) holds, then 

x A = Xp B * g 
= xp B * h, 

so xa is the result of feeding an integrable passband signal that is bandlimited 
to W Hz around the carrier frequency f c (the signal xpb) through a stable filter 
(of impulse response h). Consequently, by Proposition 7.2.5, xa is an integrable 
passband signal that is bandlimited to W Hz around the carrier frequency f c and 
its FT is given by / <— > x PB (f)h(f). Thus, as we next justify, 

x A (f) = x PB (f)h(f) 
= x PB (f)9(f) 
= x PB (f)9(f)I{f>0} 
= ip B (/)l{/>0}, /el, 

thus establishing (c). Here the third equality is justified by noting that the as- 
sumption that xpb is a passband signal implies, by Proposition 7.2.2 (c/. (c)), 
that the only negative frequencies / < where x PB (f) can be nonzero are those 
satisfying | — / — / c | < W/2, and at those frequencies g is zero by (7.15), (7.16), 
and (7.1b). The fourth equality follows by noting that the assumption that xpb 
is a passband signal implies, by Proposition 7.2.2 (c/. (c)), that the only positive 
frequencies / > where x PB (f) can be nonzero are those satisfying \f — f c \ < W/2 
and at those frequencies g(f) = 1 by (7.15). □ 



7.5.2 From xa back to xpb 

Proposition 7.5.2 describes the analytic representation xa in terms of the real 
passband signal xpb • This representation would have been useless if we had not 
been able to recover xpb from xa- Fortunately, we can. The key is that, because 
xpb is real, its FT is conjugate- symmetric 

x PB (-f) = x* PB (f) 1 /el. (7.18) 

Consequently, since the FT of xa is equal to that of xpb at the positive frequencies 
and to zero at the negative frequencies (7.13), we can add to xa its conjugated 
mirror- image to obtain xpb: 

x PB (f)=x A (f) + x* A (-f), /€K; (7.19) 



J At t = 0, the RHS of (7.17) should be interpreted as (W+ W c )/2. 
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W c 
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h 



Figure 7.8: The function g of (7.15), which is used in the proof of Proposition 7.5.2. 



see Figure 7.12 on Page 124. From here it is just a technicality to obtain the 
time-domain relationship 

ar PB (t) = 2 Re(x A (*)) , teR. (7.20) 

These results are summarized in the following proposition. 

Proposition 7.5.3 (Recovering xpe from xa). Let xp B be a real integrable pass- 
band signal that is bandlimited to WHz around the carrier frequency f c , and let xa 
be its analytic representation. Then, 



and 



x P B(f) = i A (f) + x* A (-f), /el, 



x PB (t) = 2Re(x A (t)), te 



(7.21a) 



(7.21b) 



Proof. The frequency relation (7.21a) is just a restatement of (7.19), whose deriva- 
tion was rigorous. To prove (7.21b) we note that, by Proposition 7.2.2 (c/. (b) & 
(c)), 



x PB (t) 



x PB (f)e^^df 



\2nft 



x PB (/)e l ^ t d/+ / x PB (/)e'^d/ 



x A (t) + / x PB (f) e' 2 *" d/ 



x A (t) + £pB(-f)e-' a " ft df 

Jo 

/•OO 

x A (t)+ / x PB (/)e- i2 ^*d/ 



x A (t) + (J x PB (f) e^ft d/~ 

x A (t) + x* A {t) 
2Re(x A {t)), (61, 
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where in the second equality we broke the integral into two; in the third we used 
Definition 7.5.1; in the fourth we changed the integration variable to / = — /; 
in the fifth we used the conjugate symmetry of xpg (7.18); in the sixth we used 
the fact that conjugating the integrand results in the conjugation of the integral 
(Proposition 2.3.1); in the seventh we used the definition of the analytic signal; 
and in the last equality we used the fact that a complex number and its conjugate 
add up to twice its real part. □ 

7.5.3 Relating (x PB ,ypB) to (x A ,yA> 

We next relate the inner product between two real passband signals to the inner 
product between their analytic representations. 

Proposition 7.5.4 ((xpbjYpb) and (xAjYa))- Let xpe and ype be real integrable 
passband signals that are bandlimited to WHz around the carrier frequency f c , and 
let xa and yA be their analytic representations. Then 

(x P B,ypB> = 2Re((xA,y A », (7.22) 

and 

||xp B || 2 2 =2||x A || 2 2 . (7.23) 

Note that in (7.22) the inner product appearing on the LHS is the inner product 
between real signals whereas the one appearing on the RHS is between complex 
signals. 

Proof. We first note that the inner products and energies are well-defined because 
integrable passband signals are also energy-limited (Corollary 7.2.4). Next, even 
though (7.23) is a special case of (7.22), we first prove (7.23). The proof is a simple 
application of Parseval's Theorem. The intuition is as follows. Since xpb is real, 
it follows that its FT is conjugate- symmetric (7.18) so the magnitude of xpb is 
symmetric. Consequently, the positive frequencies and the negative frequencies 
of xpb contribute an equal share to the total energy in xpb- And since the energy 
in the analytic representation is equal to the share corresponding to the positive 
frequencies only, its energy must be half the energy of xpb- 

This can be argued more formally as follows. Because xpb is real- valued, its FT xpb 
is conjugate-symmetric (7.18), so its magnitude is symmetric |£pb(/)| = |^pb( — /)| 
for all / G R and, a fortiori, 



|£p B (/)rd/=/ \x PB {f)Vdf. (7.24) 

>0 J-oo 

Also, by Parseval's Theorem (applied to xpb), 

/>oo t>0 

/ |£ PB (/)| 2 d/+ / |x P B(/)| 2 d/=||xp B || 2 2 . (7.25) 

^0 J-oo 

Consequently, by combining (7.24) and (7.25), we obtain 

|xpB(/)| 2 d/=-||xp B || 2 2 . (7.26) 
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We can now establish (7.23) from (7.26) by using Parseval's Theorem (applied 
to xa) and (7.13) to obtain 



|x A |i| = iixaIi! 



I^A(/)| 2 d/ 
l^p B (/)| 2 d/ 







illxool' 2 



x pb|i 2 , 



where the last equality follows from (7.26). 

We next prove (7.22). We offer two proofs. The first is very similar to our proof 
of (7.23): we use Parseval's Theorem to express the inner products in the fre- 
quency domain, and then argue that the contribution of the negative frequencies 
to the inner product is the complex conjugate of the contribution of the positive 
frequencies. The second proof uses a trick to relate inner products and energies. 

We begin with the first proof. Using Proposition 7.5.3 we have 

x PB (f) = x A (f) + x* A (-f), /el, 
ypB(f) = U(f) + iX(-f), /«»■ 

Using Parseval's Theorem we now have 

(xpB,ypB> = (x PB ,ypB> 

£pB(/)y PB (/) d / 



*a(/) + *!(-/)) (&(/) + &(-/)) d/ 
*a(/) + xX(-f)) (vl(f) + VA(-fj) d/ 

/oo 
x* A (-f)y A (-f)df 
-co 

x A (f) f A (f) df+([°° x A (-f) f A (-f) d/) * 

■J — CO 

x A (f) f A (f) df+(f x A (f) y* A (f) d/T 

J — CO 

= <XA,yA> + (x A ,yA>* 

= 2Re((x A ,y A » 
= 2Re((x A ,y A )), 

where the fifth equality follows because at all frequencies / € M. the cross-terms 
x A (f) y A (—f) and x A (—f) y A (f) are zero, and where the last equality follows from 
Parseval's Theorem. 
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The second proof is based on (7.23) and on the identity 

2Re«u,v» = ||u + v|| 2 2 -||u|| 2 2 -||v|| 2 2 , u,v££«, (7.27) 

which holds for both complex and real signals and which follows by expressing 
llu + v|L as 



12 

11 ■ 'l* 



u + v|L = (u + v,u + v) 



= (u,u) + (u,v) + (v,u) + (v,v) 
= ||u|| 2 + ||v|| 2 2 + (u,v> + (u,v>* 
= ||u|| 2 + ||v|| 2 2 +2Re«u,v». 
From Identity (7.27) and from (7.23) we have for the real signals xpb and ype 

2(x PB ,ypB> = 2Re((x PB ,ypB)) 

= ||xp B + vpbII 2 , - ||x PB || 2 - ||y PB || 2 

= 2(||x A + yA || 2 -||x A || 2 2 -|| yA || 2 

= 4Re((x A ,y A )), 

where the first equality follows because the passband signals are real; the second 
from Identity (7.27) applied to the passband signals xpb and yps! the third from 
the second part of Proposition 7.5.4 and because the analytic representation of 
x pb + ypB is x A + y A ; and the final equality from Identity (7.27) applied to the 
analytic signals x A and y A . □ 
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Strictly speaking, the baseband representation xbb of a real passband sig- 
nal xpb is not a "representation" because one cannot recover xpb from xbb alone; 
one also needs to know the carrier frequency f c . This may seem like a disadvantage, 
but engineers view this as an advantage. Indeed, in some cases, it may illuminate 
the fact that certain operations and results do not depend on the carrier frequency. 
This decoupling of various operations from the carrier frequency is very useful in 
hardware implementation of communication systems that need to work around 
selectable carrier frequencies. It allows for some of the processing to be done us- 
ing carrier-independent hardware and for only a small part of the communication 
system to be tunable to the carrier frequency. Very loosely speaking, engineers 
think of xbb as everything about xpb that is not carrier-dependent. Thus, one 
does not usually expect the quantity f c to show up in a formula for the baseband 
representation. Philosophical thoughts aside, the baseband representation has a 
straightforward definition. 

7.6.1 Definition and Characterization 

Definition 7.6.1 (Baseband Representation). The baseband representation of 

a real integrable passband signal xpb that is bandlimited to W Hz around the carrier 
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frequency f c is the complex signal 

ccbbO) = e- i2 ^* XA (t), t e R, (7.28) 

where xa is the analytic representation o/xpb- 

Note that, by (7.28), the magnitudes of xa and xbb are identical 

\x B B(t)\ = \x A (t)\, teR. (7.29) 

Consequently, since xa is integrable we also have: 

Proposition 7.6.2 (Integrability of xpb Implies Integrability of xbb)- The base- 
band representation of a real integrable passband signal that is bandlimited to W 
Hz around the carrier frequency f c is integrable. 

By (7.28) and (7.13) we obtain that if xpb is a real integrable passband signal that 
is bandlimited to W Hz around the carrier frequency / c , then 

- m wufi J*pb(/ + /c) if |/| < W/2, 

xbb(/) =x A {f + f c )= ( . (7.30) 

I otherwise. 

Thus, the FT of xbb is the FT of xa but shifted to the left by the carrier fre- 
quency f c . The relationship between the Fourier Transforms of xpb, xa, and xbb 
is depicted in Figure 7.9. 

We have defined the baseband representation of a passband signal in terms of its 
analytic representation, but sometimes it is useful to define the baseband represen- 
tation directly in terms of the passband signal. This is not very difficult. Rather 
than taking the passband signal and passing it through a filter of frequency re- 
sponse g satisfying (7.14) to obtain xa and then multiplying the result by e _l27r /°' 
to obtain xbb, we can multiply xpb by t \— > e _l27r / c * and then filter the result to 
obtain the baseband representation. This procedure is depicted in the frequency 
domain in Figure 7.10 and is made precise in the following proposition. 

Proposition 7.6.3 (From xpb to xbb Directly), //xpb is a real integrable passband 
signal that is bandlimited to WHz around the carrier frequency f c , then its baseband 
representation xbb is given by 

x BB = (i^e- i2,r/c 'xpB(t))*go, (7.31a) 

where go : / i— * ga(f) is any integrable function satisfying 

9o(f) = 1, |/| < y , (7.31b) 



3o(/) = 0, |/ + 2/ c |<y. (7.31c) 
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£pb(/) 



*a(/) 



-w 



£bb(/) 

A 



Figure 7.9: The Fourier Transforms of the analytic signal xa and of the baseband 
representation xbb of a real passband signal xpe- 



Proof. The proof is all in Figure 7.10. For the pedantic reader we provide more 
details. By Definition 7.6.1 and by Proposition 7.5.2 (c/. (d)) we have for any 
integrable function g: / t— » g(f) satisfying (7.14b) & (7.14c) 



EBb(*) 



(x PB *g)(t) 



e -i27r/ c t 

e- i2x/ct / x PB (/)ff(/) e i2w/t d/ 

J — CO 

^PB(/)9(/)e iM/ - /c)t d/ 
*PB(/+/c)9(/ + /c)e i2 ^d/ 
x P B(/ + /c)5o(/)e i2 ^*d/ 
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Figure 7.10: A frequency-domain description of the process for deriving xbb di- 
rectly from xpb. From top to bottom: xpb; the FT of t f— > e -12 ^^* xpB(i); a 
function go satisfying (7.31b) & (7.31c); and xbb- 
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((' 



e- W x PB (i))*go )(t), 



where we defined 



9o(f) = g(f + fc), f€ 



(7.32) 



and where we use the following justification. The second equality follows from 
Proposition 6.2.5; the third by pulling the complex exponential into the integral; 
the fourth by the defining / = / — / c ; the fifth by defining the function g as in 
(7.32); and the final equality by Proposition 6.2.5 using the fact that 

the FT of t ^ e" i27r/c * x PB {t) is / i— * x PB {f + f c ). (7.33) 

The proposition now follows by noting that g satisfies (7.14b) & (7.14c) if, and 
only if, the mapping go defined in (7.32) satisfies (7.31b) & (7.31c). □ 

Corollary 7.6.4. J/xpb is a real integrable passband signal that is bandlimited to W 
Hz around the carrier frequency f c , then its baseband representation xbb is given 
by 



(7.34a) 




where the cutoff frequency W c can be chosen arbitrarily in the range 



W W 

— < W c <2/ c . 

2 _ c - Jc 2 



(7.34b) 



Proof. Let W c satisfy (7.34b) and define g as follows: if W c is strictly smaller 
than 2/ c -W/2, define g (f) = I{|/| < W c }; otherwise define g (f) = I{|/| < W c }. 
In both cases go satisfies (7.31b) & (7.31c) and 

go = LPF Wc • (7.35) 

The result now follows by applying Proposition 7.6.3 with this choice of go- □ 

In analogy to Proposition 7.5.2, we can characterize the baseband representation 
of passband signals as follows. 

Proposition 7.6.5 (Characterizing the Baseband Representation). Let xpb be 

a real integrable passband signal that is bandlimited to W Hz around the carrier 
frequency f c . Then each of the following statements is equivalent to the statement 
that the complex signal xbb is its baseband representation. 



(a) The signal xbb is given by 



XBB{t) 



W/2 



£ PB (/ + /c)e i2T/t d/, te 



W/2 



(7.36) 
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(b) The signal xbb is a continuous integrable signal satisfying 

*BB(/) = £p B (/ + /c)l{|/|<y}, /€R. (7.37) 

(c) The signal xbb is an integrable signal that is bandlimited to W/2 Hz and that 

satisfies (7.37). 

(d) The signal xbb is given by (7.31a) for any go : / i— * go{f) satisfying (7.31b) 
& (7.31c). 

Proof. Parts (a), (b), and (c) can be easily deduced from their counterparts in 
Proposition 7.5.2 using Definition 7.6.1 and the fact that (7.29) implies that the 
integrability of xbb is equivalent to the integrability of xa • Part (d) is a restatement 
of Proposition 7.6.3. □ 



7.6.2 The In-Phase and Quadrature Components 

The convolution in (7.34a) is a convolution between a complex signal (the signal 
t i— > e~ l27r ^ c * xpe(i)) and a real signal (the signal LPFw c ). This should not alarm 
you. The convolution of two complex signals evaluated at time t is expressed as an 
integral (5.2), and in the case of complex signals this is an integral (over the real 
line) of a complex-valued integrand. Such integrals were addressed in Section 2.3. 
It should, however, be noted that since the definition of the convolution of two sig- 
nals involves their products, the real part of the convolution of two complex-valued 
signals is, in general, not equal to the convolution of their real parts. However, as 
we next show, if one of the signals is real — as is the case in (7.34a) — then things 
become simpler: if x is a complex- valued function of time and if h is a real- valued 
function of time, then 



Re(x • h) = Re(x) * h and Im(x • h) = Im(x) * h, h is real- valued. 



(7.38) 



This follows from the definition of the convolution, 

/oo 
x{t) h{t - t) &T 
-oo 

and from the basic properties of complex integrals (Proposition 2.3.1) by noting 
that if h(-) is real- valued, then for all t, r £ K, 

Re(x(r) h(t - r)) = Re(x(r)) h(t - t), 
Im(x(r) h(t - t)) = Iiii(z(t)) h(t - r). 

We next use (7.38) to express the convolution in (7.31a) using real-number oper- 
ations. To that end we first note that since xpp is real, it follows from Euler's 
Identity 

e' e = cos 9 + i sin 9, 9 e E (7.39) 
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that 

Re(x PB {t) e ~ i27r/ct ) = cc PB (£) cos(2tt f c t), t <= R, (7.40a) 

Im(x PB (*) e- i27r /<=*) = -x PB {t) sm(2nf c t), t e R, (7.40b) 

so by (7.34a), (7.38), and (7.40) 

Re(x BB ) = (t i-» ccps(i) cos(27r/ c t)) * LPF Wc/2 , (7.41a) 



Iui(xbb) = -^^>ccp B (t)sin(27r/ c i)J *LPF Wc/2 . (7.41b) 

It is common in the engineering literature to refer to the real part of x BB as 
the in-phase component of xp B and to the imaginary part as the quadrature 
component of xp B . 

Definition 7.6.6 (In-Phase and Quadrature Components). The in-phase com- 
ponent of a real integrable passband signal xp B that is bandlimited to W Hz around 
the carrier frequency f c is the real part of its baseband representation, i.e., 

Re(x BB ) = (t h-> x PB (t) cos(2tt f c t)) * LPF Wc . (In-Phase) 

The quadrature component is the imaginary part of its baseband representation, 
i.e., 



Im(x BB ) = — [t i— > ccp B (i) sin(27r/ c i) j *LPFw c • (Quadrature) 

Here W c is any cutoff frequency in the range W/2 < W c < 2/ c — W/2. 

Figure 7.11 depicts a block diagram of a circuit that produces the baseband rep- 
resentation of a real passband signal. This circuit will play an important role 
in Chapter 9 when we discuss the Sampling Theorem for passband signals and 
complex sampling. 

7.6.3 Bandwidth Considerations 

The following is a simple but exceedingly important observation regarding band- 
width. Recall that the bandwidth of xp B around the carrier frequency f c is defined 
in Definition 7.3.1 and that the bandwidth of the baseband signal x BB is defined 
in Definition 6.4.13. 

Proposition 7.6.7 (xp B , x BB , and Bandwidth). // the real integrable passband 
signal xp B is of bandwidth W Hz around the carrier frequency f c , then its baseband 
representation x BB is an integrable signal of bandwidth W/2 Hz. 

Proof. This can be seen graphically from Figure 7.9 or from Figure 7.10. It can 
be deduced analytically from (7.30). □ 
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XpB{t) 



xpB(t)cos(2irf c ) 



LPF V 



Re(xBB(t)) 



cos(27r/ c t) 



f < W c < 2/ c - f 



90° 



-xpB{t)sm(2Trf c t) 



LPFv 



Im(a;BB(t)) 



Figure 7.11: Obtaining the baseband representation of a real passband signal. 



7.6.4 Recovering xpe from xbb 

Recovering a real passband signal xpb from its baseband representation xbb is 
conceptually simple. We can recover the analytic representation via (7.28) and 
then use Proposition 7.5.3 to recover Xpe: 

Proposition 7.6.8 (From xbb to xpb). Let xpb be a real integrable passband 
signal that is bandlimited to W Hz around the carrier frequency f c , and let xbb be 
its baseband representation. Then, 



and 



XPB(f) =£bb(/-/c) +*Bb(-/-/c)i /€ 



x PB {t) = 2Re(x BB {t)e i2v ^ t ) 1 te 



(7.42a) 



(7.42b) 



The process of recovering xpb from xbb is depicted in the frequency domain in 
Figure 7.12. It can, of course, also be carried out using real-number operations 
only by rewriting (7.42b) as 



x PB {t) = 2Re(x BB {t))cos{2TTf c t) - 2Im(x BB {t)) sm{2irf c t), t e 



(7.43) 



It should be emphasized that (7.42b) does not characterize the baseband represen- 
tation of xpb; it is possible that xp B (t) = 2 Re(z(t) e l27r ^ c *) hold at every time t and 
that z not be the baseband representation of xpb- However, as the next proposition 
shows, this cannot happen if z is bandlimited to W/2 Hz. 

Proposition 7.6.9. Let xpb be a real integrable passband signal that is bandlimited 
to W Hz around the carrier frequency f c . Lf the complex signal z satisfies 



x PB {t)=2Re(z{t)e' 2 " lf -' t ), te 



(7.44) 
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£bb(/) 

A 



&Bb(/~ /c) 
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A 



*Bb(-/~ /c) 

A 



-W 



IPB(/) = XBB(/- /c) +Xbb(-/~ /c) 




Figure 7.12: Recovering a passband signal from its baseband representation. Top 
plot of xbb is the transform of xbb; next is the transform of 1 1— » i BB (i) e l27r ^ c *; the 



transform of £bb(^)j the transform of £ i 



b BB 



(£) e l2 '"'/<:*; and finally the transform 



of t .-> lBB (t) e i2 **>* +z B b(*) e_i2 " /c ' = 2Re(xBB(t) e i2 ^*) = arp B (i). 



7.6 Baseband Representation of Real Passband Signals 125 

and is an integrable signal that is bandlimited to W/2 Hz, then z is the baseband 
representation o/xpb- 

Proof. Since z is bandlimited to W/2 Hz, it follows from Proposition 6.4.10 (c/. (c)) 
that z must be continuous and that its FT must vanish for |/| > W/2. Conse- 
quently, by Proposition 7.6.5 (c/. (b)), all that remains to show in order to establish 
that z is the baseband representation of xpb is that 

*(/) = *pb(/ + /c), l/l<W/2, (7.45) 

and this is what we proceed to do. By taking the FT of both sides of (7.44) we 
obtain that 

£pb(/)=*(/-/c)+2*(-/-/c), /6», (7-46) 

or, upon defining / = / - f c , 

x PB (f + f c ) = z(f) + z*(-f-2f c ), /el. (7.47) 

By recalling that f c > W/2 and that z is zero for frequencies / satisfying |/| > W/2, 
we obtain that z*(—f — 2/ c ) is zero whenever |/| < W/2 so 

K~f) + «*(-/- 2/c) = *(/)» l/l < W/2. (7.48) 

Combining (7.47) and (7.48) we obtain 

x PB (f + f c ) = z(f), |/| < W/2, 
thus establishing (7.45) and hence completing the proof. □ 

Proposition 7.6.9 is more useful than its appearance may suggest. It provides an 
alternative way of computing the baseband representation of a signal. It demon- 
strates that if we can use algebra to express xpb in the form (7.44) for some signal z, 
and if we can verify that z is bandlimited to W/2 Hz, then z must be the baseband 
representation of xpb- 

Note that the proof would also work if we replaced the assumption that z is an 
integrable signal that is bandlimited to W/2 Hz with the assumption that z is an 
integrable signal that is bandlimited to f c Hz. 

7.6.5 Relating (x PB ,ypB) to (xbb^bb) 

If xpb and ypB are integrable real passband signals that are bandlimited to W Hz 
around the carrier frequency / c , and if xa, xbb , YA, and ybb are their corre- 
sponding analytic and baseband representations, then, by (7.28), 

(x B b,vbb) = (x a ,va), (7.49) 
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(x B B,yBB> = / x BB (t) y BB (t) dt 

e- i2 * SA (t)e i2jAt !;l(t)di 

= <xa,ya> • 

Combining (7.49) with Proposition 7.5.4 we obtain the following relationship be- 
tween the inner product between two real passband signals and the inner product 
between their corresponding complex baseband representations. 

Theorem 7.6.10 ((xpB,ypB) and (xBB,yBB))- Let xpb and yp B be two real inte- 
grable passband signals that are bandlimited to W Hz around the carrier frequency 
f c , and let xbb and y BB be their corresponding baseband representations. Then 



<x PB ,ypB> = 2 Re((x BB ,yBB)), 



and 




(7.50) 



(7.51) 



An extremely important corollary provides a necessary and sufficient condition for 
the inner product between two real passband signals to be zero, i.e., for two real 
passband signals to be orthogonal. 

Corollary 7.6.11 (Characterizing Orthogonal Real Passband Signals). Two in- 

tegrable real passband signals xpB,yps that are bandlimited to W Hz around the 
carrier frequency f c are orthogonal if and only if, the inner product between their 
baseband representations is purely imaginary (i.e., of zero real part). 

Thus, for two such bandpass signals to be orthogonal their baseband represen- 
tations need not be orthogonal. It suffices that their inner product be purely 
imaginary. 



7.6.6 The Baseband Representation of xpB*ypB 

Proposition 7.6.12 (The Baseband Representation of xpB*ypB Is XBB*yBB)- 
Let xpb and ypB be real integrable passband signals that are bandlimited to W Hz 
around the carrier frequency f c , and let xbb and y BB be their baseband repre- 
sentations. Then the convolution xpg • ypB is a real integrable passband signal 
that is bandlimited to W Hz around the carrier frequency f c and whose baseband 
representation is xbb *yBB- 
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Figure 7.13: The convolution of two real passband signals and its baseband rep- 
resentation. 
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Proof. The proof is illustrated in Figure 7.13 on Page 127. All that remains is to 
add some technical details. We begin by defining 

z = xp B *ypB 

and by noting that, by Proposition 7.2.5, z is an integrable real passband signal 
that is bandlimited to W Hz around the carrier frequency f c and that its FT is 
given by 

*(/)=£pb(/)#pb(/), /el. (7.52) 

Thus, it is at least meaningful to discuss the baseband representation of xpB*ypB- 

We next note that, by Proposition 7.6.5, both xbb and yBB are integrable signals 
that are bandlimited to W/2 Hz. Consequently, by Proposition 6.5.2, the convolu- 
tion u = xbb * v bb is defined at every epoch t and is also an integrable signal that 
is bandlimited to W/2 Hz. Its FT is 

u(f)=x BB (f)y BB (f), /el. (7.53) 

From Proposition 7.6.5 we infer that to prove that u is the baseband representation 
of z it only remains to verify that u is the mapping / i— > z(f + f c ) I{|/| < W/2}, 
which, in view of (7.52) and (7.53), is equivalent to showing that 

x B B(f)yBB(f) = x PB (f + f c )y PB (f + f c )I{\f\<W/2} 1 /el. (7.54) 

But this follows because the fact that xbb and yBB are the baseband representa- 
tions of xpb and ypB implies that 

£bb(/)=£pb(/ + /c)I{|/|<W/2}, f€R, 
VBB(f) = ypn(f + fc)i{\f\<W/2}, /el, 

from which (7.54) follows. □ 



7.6.7 The Baseband Representation of xpb * h 

We next study the result of passing a real integrable passband signal xpb that is 
bandlimited to W Hz around the carrier frequency f c through a real stable filter 
of impulse response h. Our focus is on the baseband representation of the result. 

Proposition 7.6.13 (Baseband Representation of xpB*h). Let xpb be a real inte- 
grable passband signal that is bandlimited to W Hz around the carrier frequency f c , 
and let h be a real integrable signal. Then xpB*h is defined at every time instant; 
it is a real integrable passband signal that is bandlimited to W Hz around the carrier 
frequency f c ; and its baseband representation is of FT 

f^x BB (f)h(f + f c ), /el, (7.55) 

where xbb is the baseband representation o/xpb- 
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Proof. That the convolution xpb * h is defined at every time instant follows from 
Proposition 7.2.5. Defining y = xpe *hwe have by the same proposition that y is 
a real integrable passband signal that is bandlimited to W Hz around the carrier 
frequency f c and that its FT is given by 

y(f) = x PB (f)h(f), /el. (7.56) 

Applying Proposition 7.6.5 (cf. (b)) to the signal y we obtain that the baseband 
representation of y is of FT 

/^£pb(/ + / c )M/ + /c)I{|/|<W/2}, /el. (7.57) 

To conclude the proof it thus remains to establish that the mappings (7.57) and 
(7.55) are identical. But this follows because, by Proposition 7.6.5 (cf. (b)) applied 
to the signal xpe, 



ZBB(/)=Z PB (/ + /c)l{|/|<y}, / 



2- '""-■ 



Motivated by Proposition 7.6.13 we put forth the following definition. 

Definition 7.6.14 (Frequency Response with Respect to a Band). For a stable 
real filter of impulse response h we define the frequency response with respect 
to the bandwidth W around the carrier frequency f c (satisfying f c > W/2) 
as the mapping 

/-M/+/c)l{|/|<y}. (7.58) 

Figure 7.14 illustrates the relationship between the frequency response of a real 
filter and its response with respect to the carrier frequency f c and bandwidth W. 
Heuristically, we can think of the frequency response with respect to the band- 
width W around the carrier frequency f c of a filter of real impulse response h as 
the FT of the baseband representation of h *BPFw,/ c - 2 

With the aid of Definition 7.6.14 we can restate Proposition 7.6.13 as stating that 
the baseband representation of the result of passing a real integrable passband 
signal that is bandlimited to W Hz around the carrier frequency f c through a 
stable real filter is the product of the FT of the baseband representation of the 
signal by the frequency response with respect to the bandwidth W around the 
carrier frequency f c of the filter. This relationship is illustrated in Figures 7.15 
and 7.16. The former depicts the product of the FT of a real passband signal xpe 
and the frequency response of a real filter h. The latter depicts the product of the 
baseband representation xbb of xpb by the frequency response of h with respect 
to the bandwidth W around the carrier frequency f c . 

The relationship between some of the properties of xpb, xa, and xbb are summa- 
rized in Table 7.1 on Page 142. 



2 This is mathematically somewhat problematic because hvcBPFw,/,. need not be an integrable 
signal. But this can be remedied because h * BPFw,/ c is an energy-limited passband signal 
that is bandlimited to W Hz around the carrier frequency, and, as such, also has a baseband 
representation; see Section 7.7. 
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Figure 7.14: A real filter's frequency response (top) and its frequency response 
with respect to the bandwidth W around the carrier frequency f c (bottom) . 



7.7 Energy- Limited Passband Signals 

We next repeat the results of this chapter under the weaker assumption that the 
passband signal is energy- limited and not necessarily integrable. The key results 
require only minor adjustments, and most of the derivations are almost identical 
and are therefore omitted. The reader is encouraged to focus on the results and to 
read the proofs only if needed. 



7.7.1 Characterization of Energy-Limited Passband Signals 

Recall that energy-limited passband signals were defined in Definition 7.2.1 as 
energy-limited signals that are unaltered by bandpass filtering. In this subsec- 
tion we shall describe alternative characterizations. Aiding us in the character- 
ization is the following lemma, which can be viewed as the passband analog of 
Lemma 6.4.4 (i). 

Lemma 7.7.1. Let x be an energy-limited signal, and let f c > VV/2 > be given. 
Then the signal x * BPFw,/ c can be expressed as 



(x*BPF w ,/c)(*)= / 

J\\ 



l/|-/o|<W/2 



£(/)e i2 ^d/, te 



(7.59) 



it is of finite energy; and its L2 -Fourier Transform is (the equivalence class of) the 
mapping ft->x(f)l{\\f\-f c \ < W/2}. 
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Figure 7.15: The FT of a passband signal (top); the frequency response of a real 
filter (middle); and their product (bottom). 



Proof. The lemma follows from Lemma 6.4.4 (ii) by substituting for g the mapping 
/ >-» l{|l/l - /c| < W/2}, whose IFT is BPF w , /c . D 



In analogy to Proposition 6.4.5 we can characterize energy-limited passband signals 
as follows. 

Proposition 7.7.2 (Characterizations of Passband Signals in £g). 

(i) If x is an energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f c , then it can be expressed in the form 



x(t)= I gi^e'^'Uf, te 

'|l/|-/c|<W/2 



(7.60) 
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Figure 7.16: The FT of the baseband representation of the passband signal xpe of 
Figure 7.15 (top); the frequency response with respect to the bandwidth W around 
the carrier frequency f c of the filter of Figure 7.15 (middle); and their product 
(bottom). 
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for some mapping g: / i— > g(/) satisfying 

| 5 (/)| 2 d/<oo (7.61) 

l/|-/c|<W/2 

i/ioi can &e taken as (any function in the equivalence class of) x. 

(wj // a signal x can oe expressed as in (7.60) for some function g satisfying 
(7.61), then x is an energy-limited passband signal that is bandlimited to W 
Hz around the carrier frequency f c and its FT x is (the equivalence class of) 
the mapping f h-» <?(/) l{||/| - f c \ < W/2}. 

Proof. The proof of Part (i) follows from Definition 7.2.1 and from Lemma 7.7.1 in 
very much the same way as Part (i) of Proposition 6.4.5 follows from Definition 6.4.1 
and Lemma 6.4.4 (i). 

The proof of Part (ii) is analogous to the proof of Part (ii) of Proposition 6.4.5. □ 

As a corollary we obtain the analog of Corollary 7.2.3: 

Corollary 7.7.3 (Passband Signals Are Bandlimited). I/xpb is an energy-limited 
passband signal that is bandlimited to W Hz around the carrier frequency f c , then 
it is an energy-limited signal that is bandlimited to f c + W/2 Hz. 

Proof. If xpb is an energy-limited passband signal that is bandlimited to W Hz 
around the carrier frequency f c , then, by Proposition 7.7.2 (i), there exists a func- 
tion g: / i— > g(f) satisfying (7.61) such that xpb is given by (7.60). But this implies 
that the function / v- » g(f) 1} |/| — / c < W/2} is an energy-limited function such 
that 

/•/c+W/2 

x PB (t)= g(/)l{||/|-/ c |<W/2}c i2 ^d/, t€R, (7.62) 

so, by Proposition 6.4.5 (ii), xpb is an energy-limited signal that is bandlimited to 
f c + W/2 Hz. □ 

The following is the analog of Proposition 6.4.6. 
Proposition 7.7 A. 

(i) If xpb is an energy-limited passband signal that is bandlimited to W Hz 
around the carrier frequency f c , then xpb is a continuous function and all 
its energy is contained in the frequencies f satisfying ||/| — / c | < W/2 in the 
sense that 

|£pBlf)| 2 d/= / |x PB (/)| 2 d/. (7.63) 

• / |l/l~/c|<W/2 

(ii) //xpb G C2 satisfies (7.63), then xpb is indistinguishable from the signal 
xpb*BPFw./ c , which is an energy -limited passband signal that is bandlimited 
to W Hz around f c . If in addition to satisfying (7.63) the signal xpg is 
continuous, then xpb is an energy-limited passband signal that is bandlimited 
to W Hz around the carrier frequency f c . 
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Proof. This proposition's claims are a subset of those of Proposition 7.7.5, which 
summarizes some of the results related to bandpass filtering. □ 

Proposition 7.7.5. Let y = x*BPFw,/ c be the result of feeding the signal x € C2 to 
an ideal unit-gain bandpass filter of bandwidth W around the carrier frequency f c . 
Assume f c > W/2. Then: 

(i) y is energy -limited with 

||y|| 2 <||x|| 2 . (7.64) 

(ii) y is an energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f c . 

(Hi) The L2~Fourier Transform of y is (the equivalence class of) the mapping 
/~*(/)l{|l/|-/c|<W/2}. 

(iv) All the energy in y is concentrated in the frequencies {/ : |/| — / c < W/2} 
in the sense that 



|y(/)l 2 d/= f \y(f)\ 2 df. 

J\\f\-fJ<W/2 



'\\f\-fc\<W/2 

(v) y can be represented as 

f'OO 

V(t) = I y(f) e i2T/t d/ (7.65) 

x{f) e'^f 1 df, ief. (7.66) 



|l/|-/c|<W/2 

(vi) y is uniformly continuous. 

(vii) If all the energy o/x is concentrated in the frequencies {/ : |/| — / c < W/2} 
in the sense that 



\x{f)\ 2 df= l^(/)| 2 d/, (7.67) 

• / |l/|-/o|<W/2 

then x is indistinguishable from the passband signal x * BPF\v,/ c ■ 

(viii) z is an energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f c if and only if it satisfies all three of the following 
conditions: it is in £ 2 ; it is continuous; and all its energy is concentrated in 
the passband frequencies {/ : |/| — f c \ < W/2}. 

Proof. The proof is very similar to the proof of Proposition 6.4.7 and is thus 
omitted. □ 
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7.7.2 The Analytic Representation 

If xpb is a real energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency / c , then we define its analytic representation via (7.11). (Since 
xpb € £g, it follows from Parseval's Theorem that xpb is energy-limited so, by 
Proposition 3.4.3, the mapping / i— > xpe(/)I{|/ — / c | < W/2} is integrable and 
the integral (7.11) is defined for every t 6 K. Also, the integral does not depend 
on which element of the equivalence class consisting of the Lg-Fourier Transform 
of xpb it is applied to.) 

In analogy to Proposition 7.5.2 we can characterize the analytic representation as 

follows. 

Proposition 7.7.6 (Characterizing the Analytic Representation of xpb € £2). 
Let xpb be a real energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f c . Then each of the following statements is equivalent to the 
statement that the complex signal xa is the analytic representation o/xpb-' 

(a) The signal xa is given by 

f +^ 
x A (t) = f ' i PB (/)e iMf d/, t e R. (7.68) 

(b) The signal xa is a continuous energy-limited signal whose L 2 -Fourier Trans- 
form xa is (the equivalence class of) the mapping 

/^£pb(/)I{/>0}. (7.69) 

(c) The signal xa is an energy-limited passband signal that is bandlimited to W 
Hz around the carrier frequency f c and whose L 2 -Fourier Transform is (the 
equivalence class of) the mapping in (7.69). 

(d) The signal xa is given by 

x A = x PB *g (7.70) 

where g: / t— > g(f) is any function in Ci fl C2 satisfying 

9(f) = 1, |/-/c|<W/2, (7.71a) 

and 

9(f) = 0, |/ + /c|<W/2. (7.71b) 

Proof. The proof is not very difficult and is omitted. □ 

We note that the reconstruction formula (7.21b) continues to hold also when xpb 
is an energy-limited signal that is bandlimited to W Hz around the carrier fre- 
quency f c . 
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7.7.3 The Baseband Representation of xpb £ C2 

Having denned the analytic representation, we now use (7.28) to define the base- 
band representation. 

As in Proposition 7.6.3, we can also describe a procedure for obtaining the base- 
band representation of a passband signal without having to go via the analytic 
representation. 

Proposition 7.7.7 (From xpb € C2 to xbb Directly), //xpb is a real energy- 
limited passband signal that is bandlimited to W Hz around the carrier frequency f c , 
then its baseband representation xbb is given by 

xbb =(*>-> e" i2 ^' x PB (t)) * go, (7.72) 

where go : / 1— ► go(f) is any function in jCi H C2 satisfying 

9o(f) = 1, l/l < W/2, (7.73a) 

and 

9o(f) = 0, |/ + 2/ c |<W/2. (7.73b) 

Proof. The proof is very similar to the proof of Proposition 7.6.3 and is omitted. 

□ 

The following proposition, which is the analog of Proposition 7.6.5 characterizes 
the baseband representation of energy-limited passband signals. 

Proposition 7.7.8 (Characterizing the Baseband Representation of xpb € £g). 

Let xpb be a real energy-limited passband signal that is bandlimited to WHz around 
the carrier frequency f c . Then each of the following statements is equivalent to the 
statement that the complex signal xbb is the baseband representation o/xpb- 

(a) The signal xbb is given by 

w 

x B B(t) = [' x PB (f + f c ) e a *f* d/, t e R. (7.74) 

/ w 

(b) The signal xbb is a continuous energy-limited signal whose L,2-Fourier Trans- 
form is (the equivalence class of) the mapping 

/~£pb(/ + /c)I{|/|<W/2}. (7.75) 

(c) The signal xbb is an energy-limited signal that is bandlimited to W/2 Hz 
and whose L 2 -Fourier Transform is (the equivalence class of) the mapping 

(7.75). 

(d) The signal xbb is given by (7.72) for any mapping go : / 1— * <?o(/) satisfying 
(7.73). 



7.7 Energy-Limited Passband Signals 137 

The in-phase component and the quadrature component of an energy-limited 
passband signal are defined, as in the integrable case, as the real and imaginary 
parts of its baseband representation. 

Proposition 7.6.7, which asserts that the bandwidth of xbb is half the bandwidth 
of xpb continues to hold, as does the reconstruction formula (7.42b). Proposi- 
tion 7.6.9 also extends to energy-limited signals. We repeat it (in a slightly more 
general way) for future reference. 

Proposition 7.7.9. 

(i) If z is an energy-limited signal that is bandlimited to W/2 Hz, and if the 
signal x is given by 

x{t)=2Re(z{t)e' 27vf " t ), tet, (7.76) 

where f c > W/2, then x is a real energy-limited passband signal that is band- 
limited to W Hz around f c , and z is its baseband representation. 

(ii) If x is an energy -limited passband signal that is bandlimited to W Hz around 
the carrier frequency f c and if (7.76) holds for some energy-limited signal z 
that is bandlimited to f c Hz, then z is the baseband representation of x and 
is, in fact, bandlimited to W/2 Hz. 

Proof. Omitted. □ 

Identity (7.50) relating the inner products (xpbjYpb) and (xBB,yBB) continues to 
hold for energy- limited passband signals that are not necessarily integrable. 

Proposition 7.6.12 does not hold for energy-limited signals, because the convolution 
of two energy-limited signals need not be energy-limited. But if we assume that at 
least one of the signals is also integrable, then things sail through. Consequently, 
using Corollary 7.2.4 we obtain: 

Proposition 7.7.10 (The Baseband Representation of xpB*ypB Is XBB*yBB)- 
Let xpb be a real integrable passband signal that is bandlimited to W Hz around 
the carrier frequency f c , and let ypB be a real energy-limited passband signal that 
is bandlimited to WHz around the carrier frequency f c . Let xbb and yBB be their 
corresponding baseband representations. Then xpb * ypB is a real energy-limited 
signal that is bandlimited to W Hz around the carrier frequency f c and whose 
baseband representation is xbb *yBB- 

Proposition 7.6.13 too requires only a slight modification to address energy-limited 
signals. 

Proposition 7.7.11 (Baseband Representation of xpb * h). Let xpb be a real 
energy-limited passband signal that is bandlimited to W Hz around the carrier fre- 
quency f c , and let h be a real integrable signal. Then xpb * h is defined at every 
time instant; it is a real energy-limited passband signal that is bandlimited to W 
Hz around the carrier frequency f c ; and its baseband representation is given by 

(h*x PB ) BB = h BB *x B B, (7.77) 
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where h BB is the baseband representation of the energy-limited signal h*BPFw,/ c - 
The L 2 -Fourier Transform of the baseband representation o/xpB*h is (the equiv- 
alence class of) the mapping 

f^x B B(f)h(f + fc), /el, (7.78) 

where xbb is the baseband representation o/xpb- 

The following theorem summarizes some of the properties of the baseband repre- 
sentation of energy- limited passband signals. 

Theorem 7.7.12 (Properties of the Baseband Representation). 

(i) The mapping xpb i— > xbb that maps every real energy -limited passband signal 
that is bandlimited to W Hz around the carrier frequency f c to its baseband 
representation is a one-to-one mapping onto the space of complex energy- 
limited signals that are bandlimited to W/2 Hz. 

(ii) The mapping xpb i— ► xbb is linear in the sense that if xpb and ypB are 
real energy-limited passband signals that are bandlimited to W Hz around 
the carrier frequency f c , and if xbb and ybb are their corresponding base- 
band representations, then for every a, (5 € R, the baseband representation of 
axp B + /3ypB is ax BB + /?y B B •' 

(axpB + /3ypB) BB = ax BB + /3yBB, a,/3 € R. (7.79) 

(Hi) The mapping xpb i— » xbb is — to within a factor of two — energy preserving 
in the sense that 

||xpb|| 2 2 =2||x B b|| 2 2 . (7.80) 

(iv) Inner products are related via 

(xpB,ypB> = 2Re((x B B,yBB>), (7-81) 

for xpb and ypB as above. 

(v) The (baseband) bandwidth of xbb is half the bandwidth of xpb around the 
carrier frequency f c . 

(vi) The baseband representation xbb can be expressed in terms of xpb as 

x BB =(<i-> e- i27r/c *x PB (£)) *LPF Wc (7.82a) 
where W c is any cutoff frequency satisfying 

W/2 < W c < 2/ c - W/2. (7.82b) 

(vii) The real passband signal xpb can be expressed in terms of its baseband rep- 
resentation xbb as 

x PB (t) = 2Re(x BB (t)e' 27rf - t ), teR. (7.83) 
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(viii) Ifh is a real integrable signal, and if xpb is as above, then h * xpb is a real 
energy-limited passband signal that is bandlimited to W Hz around the carrier 
frequency f c , and its baseband representation is given by 

(h*x PB ) BB = h BB *x BB , (7.84) 

where h BB is the baseband representation of the energy-limited real signal 
h * BPF Wi/c . 



7.8 Shifting to Passband and Convolving 

The following result is almost trivial if you think about its interpretation in the 
frequency domain. To that end, it is good to focus on the case where the signal x 
is a bandlimited baseband signal and where f c is positive and large. In this case 
we can interpret the LHS of (7.85) as the result of taking the baseband signal x, 
up-converting it to passband by forming the signal r i— » x(t) e l27r ^ cT , and then 
convolving the result with h. The RHS corresponds to down-converting h to form 
the signal r t— > e~' 2 ' w ' cT h(r), then convolving this signal with x, and then up- 
converting the final result. 

Proposition 7.8.1. Suppose that f c G K and that (at least) one of the following 
conditions holds: 

1) The signal x is a measurable bounded signal and h G Ci . 

2) Both x and h are in C2 ■ 

Then, at every epoch t£K, 

Ut h-> x(t) e i2 ^ T ) * h) (t) = e i27r/ct (x * (r h-> e~ i2Tr ^ T h{r))) (t). (7.85) 

Proof. We evaluate the LHS of (7.85) using the definition of the convolution: 

((t ^ x{t) e i27r ^ T ) * h) (t) = f x{t) e a *f* T h{t - r) dr 

'J — CO 

/CO 
cc(t) e i2 ^ T h(i - r) dr 
-DC 
/CO 
:e(t) e- i2 ^(*- T ) /i(t - r) dr 
-CO 

= e i27r/ <=* (x * ( T h-» e - i2,r/cT /t(r))) (t). □ 



7.9 Mathematical Comments 

The analytic representation is related to the Hilbert Transform; see, for example, 
(Pinsky, 2002, Section 3.4). In our proof that xa is integrable whenever xpb is 
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integrable we implicitly exploited the fact that the strict inequality f c > W/2 
implies that for the class of integrable passband signals that are bandlimited to W 
Hz around the carrier frequency f c there exist Hilbert Transform kernels that are 
integrable. See, for example, (Logan, 1978, Section 2.5). 

7.10 Exercises 

Exercise 7.1 (Purely Real and Purely Imaginary Baseband Representations). Let x PB 

be a real integrable passband signal that is bandlimited to W Hz around the carrier 
frequency f c , and let xbb be its baseband representation. 

(i) Show that xbb is real if, and only if, xpb satisfies 

x PB (f c -S) = x PB (f c + 5), \5\< W 



2 



(ii) Show that xbb is imaginary if, and only if, 



x PB (f c -5) = -x PB (f c + S), \6\<™. 

Exercise 7.2 (Symmetry around the Carrier Frequency). Let xpb be a real integrable 
passband signal that is bandlimited to W Hz around the carrier frequency f c . 

(i) Show that xpb can be written in the form 

xp B (t) — w(t) cos(2irf c t) 

where w(-) is a real integrable signal that is bandlimited to W/2 Hz if, and only if, 

x PB (f c + 5) = x PB (f c -S), \6\<™. 

(ii) Show that xpb can be written in the form 

x PB (t) = w(t) sin(27r/ c £), t £ R 
for w(-) as above if, and only if, 

x PB (f c + S) = -x PB (f c -S), \6\<™. 

Exercise 7.3 (Viewing a Baseband Signal as a Passband Signal). Let x be a real integrable 
signal that is bandlimited to W Hz. Show that if we had informally allowed equality in 
(7.1b) and if we had allowed equality between f c and W/2 in (5.21), then we could have 
viewed x also as a real integrable passband signal that is bandlimited to W Hz around 
the carrier frequency f c — W/2. Viewed as such, what would have been its complex 
baseband representation? 

Exercise 7.4 (Bandwidth of the Product of Two Signals). Let x be a real energy-limited 
signal that is bandlimited to W x Hz. Let y be a real energy-limited passband signal that 
is bandlimited to W y Hz around the carrier frequency f c . Show that if f c > W x +W y /2, 
then the signal t \— » x(t) y(t) is a real integrable passband signal that is bandlimited to 
2W X + W y Hz around the carrier frequency f c . 
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Exercise 7.5 (Phase Shift). Let x be a real integrable signal that is bandlimited to W Hz. 
Let / c be larger than W. 

(i) Express the baseband representation of the real passband signal 

zpb(£) = x(i)sin(27r/ c i + <^)), (£l 

in terms of x(-) and <j>. 
(ii) Compute the Fourier Transform of zpb . 

Exercise 7.6 (Energy of a Passband Signal). Let x e £g be of energy ||x|L. 

(i) What is the approximate energy in t t— > x(t) cos(2irf c t) if / c is very large? 
(ii) Is your answer exact if x(-) is an energy-limited signal that is bandlimited to W Hz, 



where W < f c 



? 



Hint: In Part (i) approximate x as being constant over the periods o/t n cos(2irf c t). 
For Part (ii) see also Problem 6.13. 

Exercise 7.7 (Differences in Passband). Let xpb and ypB be real energy-limited passband 
signals that are bandlimited to W Hz around the carrier frequency f c . Let xbb and ybb 
be their baseband representations. Find the relationship between 

/ (xpb (t) - y PB (t)) dt and / | o;bb (t) — J/bb {t) \ dt. 

J — oo J — oo 



Exercise 7.8 (Reflection of Passband Signal). Let xpb and ypB be real integrable pass- 
band signals that are bandlimited to W Hz around the carrier frequency f c . Let xbb 
and ybb be their baseband representations. 

(i) Express the baseband representation of xpb in terms of xbb • 
(ii) Express (xpbjYpb) in terms of xbb and ybb- 

Exercise 7.9 (Deducing xbb). Let xpb be a real integrable passband signal that is band- 
limited to W Hz around the carrier frequency f c . Show that it is possible that xpa(t) be 
given at every epoch t £ R by 2Re(z(t)e l2l ' fc ') for some complex signal z(t) and that z 
not be the baseband representation of xpb- Does this contradict Proposition 7.6.9? 
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Chapter 8 

Complete Orthonormal Systems and the 
Sampling Theorem 



8.1 Introduction 

Like Chapter 4, this chapter deals with the geometry of the space C2 of energy- 
limited signals. Here, however, our focus is on infinite-dimensional linear subspaces 
of C2 and on the notion of a complete orthonormal system (CONS). As an 
application of this geometric picture, we shall present the Sampling Theorem as 
an orthonormal expansion with respect to a CONS for the space of energy-limited 
signals that are bandlimited to W Hz. 



8.2 Complete Orthonormal System 

Recall that we denote by C2 the space of all measurable signals u : R — » C satisfying 

\u(t)\ 2 dt < 00. 



Also recall from Section 4.3 that a subset U of C2 is said to be a linear subspace of 
C2 if U is nonempty and if the signal aui + /3u2 is in VI whenever ui , U2 GM and 
a, (3 € C. A linear subspace is said to be finite-dimensional if there exists a finite 
number of signals that span it; otherwise, it is said to be infinite-dimensional. The 
following are some examples of infinite-dimensional linear subspaces of £2 ■ 

(i) The set of all functions of the form 1 1— > p(t) e~'*', where p(t) is any polynomial 
(of arbitrary degree) . 

(ii) The set of all energy-limited signals that vanish outside the interval [— 1, 1] 
(i.e., that map every t outside this interval to zero). 

(iii) The set of all energy-limited signals that vanish outside some unspecified 
finite interval (i.e., the set containing all signals u for which there exists 
some a,ii£l (depending on u) such that u(t) = whenever t ^ [a, b]). 
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(iv) The set of all energy- limited signals that are bandlimited to W Hz. 

While a basis for an infinite-dimensional subspace can be defined, 1 this notion does 
not turn out to be very useful for our purposes. Much more useful to us is the 
notion of a complete orthonormal system, which we shall define shortly. 2 

To motivate the definition, consider a bi- infinite sequence . . . , (f>—i, 4>o, 4>i, 4>2, ■ ■ ■ 
in C2 satisfying the orthonormality condition 

{4> i ,4> l ,) = \{t = i 1 }, e,e?ez, (8.1) 

and let u be an arbitrary element of £2. Define the signals 

L 

ul=^ (u,0/>& L=l,2,... (8.2) 

By Note 4.6.7, ul is the projection of the vector u onto the subspace spanned 
by (0_l, . . . , 0l). By the orthonormality (8.1), the tuple (4>-i_, ■ ■ ■ ,4>i.) is an 
orthonormal basis for this subspace. Consequently, by Proposition 4.6.9, 

L 

||u|| 2 > Y, |(u>^)| 2 - L=l,2,..., (8.3) 

t=-\. 

with equality if, and only if, u is indistinguishable from some linear combination 
of [4>-\_, ■ ■ ■ , <pi)- This motivates us to explore the situation where (8.3) holds 
with equality when L — » 00 and to hope that it corresponds to u being — in some 
sense that needs to be made precise — indistinguishable from a limit of finite linear 
combinations of ... , (/>_i, <j>$, 4>\, . . . 

Definition 8.2.1 (Complete Orthonormal System). A bi-infinite sequence of sig- 
nals . . . , 0_i, 4>q, <pi, . . . is said to form a complete orthonormal system or a 
CONS for the linear subspace U of C2 if all three of the following conditions hold: 

1) Each element of the sequence is in U 

faEU, I el,. (8.4) 

2) The sequence satisfies the orthonormality condition 

(&,&,) =!{£ = £>}, £ 7 £>eZ. (8.5) 

3) For every u G ti we have 

00 
||u|| 2 = Y, K U ><M| 2 , «£"■ (8-6) 



1 A basis for a subspace is defined as a collection of functions such that any function in 
the subspace can be represented as a linear combination of a finite number of elements in the 
collection. More useful to us will be the notion of a complete orthonormal system. From a 
complete orthonormal system we only require that each function can be approximated by a linear 
combination of a finite number of functions in the system. 

2 Mathematicians usually define a CONS only for closed subspaces. Such subspaces are 
discussed in Section 8.5. 
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The following proposition considers equivalent definitions of a CONS and demon- 
strates that if {4>i} is a CONS for U, then, indeed, every element of U can be 
approximated by a finite linear combination of the functions {4>i}- 

Proposition 8.2.2. Let U be a subspace of C 2 an d let the bi-infinite sequence 
. . . , 4 > -2, 0-1)^0) 01) • • • satisfy (8.4) & (8.5). Then each of the following con- 
ditions on {4>e} is equivalent to the condition that {4>i} forms a CONS for U : 

(a) For every u£W and every e > there exists some positive integer L(e) and 



coefficients a_L(£ 



,Q!L(e) (z C such that 



L(e) 

E 

=-L(e) 



< e. 



(8.7) 



(b) For every u£ii 



lim 

L — *oo 



u - Yl ("' ^ 



(^cj For every u£ii 



EK". 



(8.9) 



('dj For every u, v s U 



(u,v) = J^ (u,0 £ ) (v,<t> e 



(8.10) 



Proof. Since (8.4) & (8.5) hold (by hypothesis), it follows that the additional 
condition (c) is, by Definition 8.2.1, equivalent to {4>i} being a CONS. It thus only 
remains to show that the four conditions are equivalent. We shall prove this by 
showing that (a) «=> (b); that (b) <=> (c); and that (c) «=> (d). 

That (b) implies (a) is obvious because nothing precludes us from choosing a£ in 
(8.7) to be (u,<f>i). That (a) implies (b) follows because, by Note 4.6.7, the signal 



E ( u '^)^> 



which we denoted in (8.2) by Ul, is the projection of u onto the linear subspace 
spanned by (</>_l, • • • , </>l) an d as such, by Proposition 4.6.8, best approximates u 
among all the signals in that subspace. Consequently, replacing a£ by (u, <f>i) can 
only reduce the LHS of (8.7). 

To prove (b) => (c) we first note that by letting L tend to infinity in (8.3) it follows 
that 

oo 

|2 



E !<«.■ 



u e £ a , 



.11 
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so to establish (c) we only need to show that if u is in VI then ||u|| 2 is also upper- 
bounded by the RHS of (8.11). To that end we first upper-bound ||u|| 2 as 



< 



u- X] ( u > < M<M+ X] ( u > fa) <l>e 

l=-L ' £=-L 

L L 

u - X) ^ u ' &) ^ + X) ^ u ' ^ ^ 

L / L 



= -L 



=-L 



2 

1/2 



ue£ 2 , (8.12) 



where the first equality follows by adding and subtracting a term; the subsequent in- 
equality by the Triangle Inequality (Proposition 3.4.1); and the final equality by the 
orthonormality assumption (8.5) and the Pythagorean Theorem (Theorem 4.5.2). 
If Condition (b) holds and if u is in 14, then the RHS of (8.12) converges to the 
square root of the infinite sum $^ eZ |(u, ^) I 2 an< ^ thus gives us the desired upper 
bound on ||u||g. 

We next prove (c) => (b). We assume that (c) holds and that u is in Li and set out 
to prove (8.8). To that end we first note that by the basic properties of the inner 
product (3.6)-(3.10) and by the orthonormality (8.1) it follows that 



u- J2 {u,<t>e}<t> i ,<t> i <) = {u,<t> e <)I{\t'\>L}, (V e Z, u e C 



Consequently, if we apply (c) to the under-braced signal u' (which for u € U is 
also in U) we obtain that (c) implies 



u- X] ( U '<^ 



=-L 



^|(u,0 £ )| 2 , ueU. 
\e\>L 



But by applying (c) to u we infer that the RHS of the above tends to zero as L 
tends to infinity, thus establishing (8.8) and hence (b). 

We next prove (c) <(=> (d). The implication (d) => (c) is obvious because we can 
always choose v to be equal to u. We consequently focus on proving (c) =>■ (d). 
We do so by assuming that u,v G U and calculating for every (3 € C 

|/3| 2 ||u|| 2 2 + 2Re(/?(u,v» + ||v|| 2 2 
= ||/3u + v|| 2 2 

CO 

= X |(/3u + v,0 £ )| 2 

£=-oo 

DC 



X] l/^U'^) + ( v ><^ 



^=-oo 
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£=-oo 

DC 



Y, |<u,<M| +2Re(/3 Y, (u,<M(v,<^ 

— — oo £— — oo 

^ |<v,0 £ )| 2 , (u,v€MJec), (8.13) 



where the first equality follows by writing ||/3u + v||g as (/3u + v, flu + v) and using 
the basic properties of the inner product (3.6)-(3.10); the second by applying (c) 
to (3u + v (which for u, v € U is also in VI); the third by the basic properties of 
the inner product; and the final equality by writing the squared magnitude of a 
complex number as its product by its conjugate. By applying (c) to u and by 
applying (c) to v we now obtain from (8.13) that 

(OO s 

(3 Y (u,^)(v,^)*J, (u.vew, /3ec), 
£=-oo ' 

which can only hold for all (3 € C (and in particular for both (3=1 and (3 = i) if 

oo 

(u,v)= Y ( U ><M( V ><M*> u,veW, 

£=-oo 

thus establishing (d). □ 

We next describe the two complete orthonormal systems that will be of most in- 
terest to us. 



8.3 The Fourier Series 

A CONS that you have probably already encountered is the one underlying the 
Fourier Series representation. You may have encountered the Fourier Series in the 
context of periodic functions, but we shall focus on a slightly different view. 

Proposition 8.3.1. For every T > 0, the functions {4>i} defined for every integer £ 
by 

(f>e- ii-> -Le i7rf */ T I{|£| <T} (8.14) 



/2T 
form a CONS for the subspace 

{u € C2 '■ u(t) = whenever \t\ > T) 

of energy-limited signals that vanish outside the interval [— T, T] . 

Proof. Follows from Theorem A. 3. 3 in the appendix by substituting 2T for S. □ 

Notice that in this case 

)=J u(t)e- Mt /Ut (8.15) 
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is the £-th Fourier Series Coefficient of u; see Note A. 3. 5 in the appendix with 2T 
substituted for S. 

Note 8.3.2. The dummy argument t is immaterial in Proposition 8.3.1. Indeed, if 
we define for W > the linear subspace 

V={ge£ 2 : g(f) = whenever |/| > W}, (8.16) 

then the functions defined for every integer £ by 

/ >-» -i= e i7rf//w I{|/| < W} (8.17) 

\/2W ~ 

form a CONS for this subspace. 

This note will be crucial when we next discuss a CONS for the space of energy- 
limited signals that are bandlimited to W Hz. 

8.4 The Sampling Theorem 

We next provide a CONS for the space of energy-limited signals that are band- 
limited to W Hz. Recall that if x is an energy- limited signal that is bandlimited 
to W Hz, then there exists a measurable function 3 g: / i— » g(f) satisfying 

g(f) = 0, |/| >W (8.18) 

and 

| 5 (/)| 2 d/<oo, (8.19) 



-w 

such that 

»(*)= / 9(/)e i2l/ 'd/, ief. (8.20) 

Conversely, if g is any function satisfying (8.18) & (8.19), and if we define x via 
(8.20) as the Inverse Fourier Transform of g, then x is an energy-limited signal that 
is bandlimited to WHz and its Lg-Fourier Transform x is equal to (the equivalence 
class of) g. 

Thus, if, as in (8.16), we denote by V the set of all functions (of frequency) satisfying 
(8.18) & (8.19), then the set of all energy-limited signals that are bandlimited to W 
Hz is just the image of V under the IFT, i.e., it is the set V, where 

V={g:geV}. (8.21) 

By the Mini Parseval Theorem (Proposition 6.2.6 (i)), if xi and X2 are given by 
gi and g2, where gi,g2 are in V, then 



(x 1 ,x 2 ) = (g 1 ,g 2 ), (8.22) 



3 Loosely speaking, this function is the Fourier Transform of x. But since x is not necessarily 
integrable, its FT x is an equivalence class of signals. Thus, more precisely, the equivalence class 
of g is the Z/g-Fourier Transform of x. Or, stated differently, g can be any one of the signals in 
the equivalence class of x that is zero outside the interval [— W, W]. 
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i.e., 

(gi,g2> = (gi,g2>, gi.&GV. (8.23) 

The following lemma is a simple but very useful consequence of (8.23). 

Lemma 8.4.1. If {tp(} is a CONS for the subspace V, which is defined in (8.16), 
then {tpi\ is a CONS for the subspace V, which is defined in (8.21). 

Proof. Let {ip e } be a CONS for the subspace V. By (8.23), 

(&,&') = <^,^>, ej'ez, 

so our assumption that {ipi} is a CONS for V (and hence that, a fortiori, it satisfies 
(il>t, %I} V ) = l{£ = £'} for all £, £' £ Z) implies that 

It remains to verify that for every x £ V 



£ |(x,^)| 2 = ||x| 



Equivalently, since every x £ V can be written as g for some g £ V, we need to 
show that 

oo 

|2 



Y, |<g,Vv)r = ||g|lL geV. 



This follows from (8.23) and from our assumption that {ipe} is a CONS for V 
because 



Yl |(g,^)| = Yl I(g'^)| 



I — — OO I — — OO 

= ||g||« 

= l|g|lL gev, 

where the first equality follows from (8.23) (by substituting g for gi and by sub- 
stituting ipt for g2); the second from the assumption that {~4>e} is a CONS for V; 
and the final equality from (8.23) (by substituting g for g! and for g 2 ). □ 

Using this lemma and Note 8.3.2 we now derive a CONS for the subspace V of 
energy- limited signals that are bandlimited to W Hz. 

Proposition 8.4.2 (A CONS for the Subspace of Energy-Limited Signals that 
Are Bandlimited to W Hz). 

(i) The sequence of signals that are defined for every integer £ by 

1 1-» \/2Wsinc(2Wi + £) (8.24) 

forms a CONS for the space of energy-limited signals that are bandlimited 

toWHz. 



150 Complete Orthonormal Systems and the Sampling Theorem 

(ii) If x is an energy-limited signal that is bandlimited to W Hz, then its inner 
product with the £-th signal is given by its scaled sample at time —£/(2W): 



x,ii-> V2Wsinc(2Wt + £) ) = — ^ x , £ e Z. (8.25) 

/ \/2W ^ 2W7 

Proof. To prove Part (i) we recall that, by Note 8.3.2, the functions defined for 
every £ € Z by 

^:f^-±=e'^/ w I{\f\<W} (8.26) 

form a CONS for the subspace V. Consequently, by Lemma 8.4.1, their Inverse 
Fourier Transforms {ipl} form a CONS for V. It just remains to evaluate ipi 
explicitly in order to verify that it is a scaled shifted sinc(-): 



Mt)= / MD^^df 



W i 

e i^//W e i2./t d/ (g 2?) 



-w V2W 
= \/2Wsinc(2Wi + €), (8.28) 

where the last calculation can be verified by direct computation as in (6.35). 

We next prove Part (ii). Since x is an energy- limited signal that is bandlimited 
to W Hz, it follows that there exists some g G V such that 

x = g, (8.29) 

i.e., 

x(t)= / g(f)e' 2 « ft df, iet. (8.30) 

J-w 

Consequently, 

/x, t i-> y2Wsinc(2Wt + £)) = (x, tp f ) 

= (g,^) 
= (g,^> 

W 5 (/)f-^ e ^/ w )*d/ 



w x 

w 



1 / 5 ( /)e -^//w d/ 
2Wi-w 

l , £_y 



xl-^T-A, £e 



V2W V 2W/ 

where the first equality follows from (8.28); the second by (8.29); the third by (8.23) 
(with the substitution of g for g! and xp£ for g 2 ); the fourth by the definition of 
the inner product and by (8.26); the fifth by conjugating the complex exponential; 
and the final equality by substituting —£/(2W) for t in (8.30). □ 
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Using Proposition 8.4.2 and Proposition 8.2.2 we obtain the following C2 version 
of the Sampling Theorem. 

Theorem 8.4.3 (/^-Sampling Theorem). Let x be an energy -limited signal that 
is bandlimited to W Hz, where W > 0, and let 

T=— . (8.31) 

2W V ' 

(i) The signal x can be reconstructed from the sequence . . . , x(—T), x(0), x(T), . . . 
of its values at integer multiples of J in the sense that 

2 
dt = 0. 



/ / 1 \ 

x{t)- ^ x(-£T)smc(-+e) 
00 fc-L 

(ii) The signal's energy can be reconstructed from its samples via the relation 

/oo 00 

\ X (t)\ 2 dt = T £ woi 2 - 
-°° £=-00 

(Hi) If y is another energy-limited signal that is bandlimited to W Hz, then 

00 
<x,y)=T ]T x(£T)y*(£T). 

t=-oo 

Note 8.4.4. If T < 1/(2W), then any energy-limited signal x that is bandlimited 
to W Hz is also bandlimited to 1/(2T) Hz. Consequently Theorem 8.4.3 continues 
to hold if we replace (8.31) with the condition 

0<T<^. (8.32) 

Table 8.1 highlights the duality between the Sampling Theorem and the Fourier 
Series. 

We also mention here without proof a version of the Sampling Theorem that allows 
one to reconstruct the signal pointwise, i.e., at every epoch t. Thus, while Theo- 
rem 8.4.3 guarantees that, as more and more terms in the sum of the shifted sine 
functions are added, the energy in the error function tends to zero, the following 
theorem demonstrates that at every fixed time t the error tends to zero. 

Theorem 8.4.5 (Pointwise Sampling Theorem). If the signal*, can be represented 
as 

x(t)= / g(f)e'^ ft df, iet (8.33) 

J-w 

for some function g satisfying 

,-w 

\g(f)\df<oo, (8.34) 

w 
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and if < T < 1/(2W), then for every tet 

x(t)=1im V x(- ft) siac ( - + £ ) . (8.35) 

Proof. See (Pinsky, 2002, Chapter 4, Section 4.2.3, Theorem 4.2.13). □ 

The Sampling Theorem goes by various names. It is sometimes attributed to 
Claude Elwood Shannon (1916-2001), the founder of Information Theory. But 
it also appears in the works of Vladimir Aleksandrovich Kotelnikov (1908-2005), 
Harry Nyquist (1889-1976), and Edmund Taylor Whittaker (1873-1956). For fur- 
ther references regarding the history of this result and for a survey of many related 
results, see (Unser, 2000). 



8.5 Closed Subspaces of C 2 

Our definition of a CONS for a subspace hi is not quite standard, because we only 
assumed that hi is a linear subspace; we did not assume that hi is closed. In this 
section we shall define closed linear subspaces and derive a condition for a sequence 
{4>e} to form a CONS for a closed subspace hi. (The set of energy-limited signals 
that vanish outside the interval [— T, T] is closed, as is the class of energy-limited 
signals that are bandlimited to W Hz.) 

Before proceeding to define closed linear subspaces, we pause here to recall that 
the space £2 is complete. 4 

Theorem 8.5.1 (£2 ' s Complete). If the sequence Ui,ti2,... of signals in £2 * s 
such that for any e > there exists a positive integer L(e) such that 

||u„ - u m \\ 2 < e, n,m>L(e), 

then there exists some function u s £ 2 such that 

lim ||u - u„|L = 0. 

n — >oo 

Proof. See, for example, (Rudin, 1974, Chapter 3, Theorem 3.11). □ 

Definition 8.5.2 (Closed Subspace). A linear subspace hi of £ 2 is said to be 
closed if for any sequence of signals U!,u 2 , ... inhi and any u G £2, the condition 
|| u — U n ||g — > implies that u is indistinguishable from some element of hi. 

Before stating the next theorem we remind the reader that a bi-infinite sequence 
of complex numbers . . . , a_i, ao, <x\, . . . is said to be square summable if 

00 

Ei i 2 
I a* I < 00. 



This property is usually stated about Lj but we prefer to work with Cg 
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Theorem 8.5.3 (Riesz-Fischer). Let U be a closed linear subspace of C%, and let 

the bi-infinite sequence . . . , </>_i, 4>o, (pi, . . . satisfy (8.4) & (8.5). Let the bi-infinite 
sequence of complex numbers . . . , a_i, a , ai, . . . be square summable. Then there 
exists an element u in hi satisfying 



lim 



= 0; (8.36a) 

2 



u - E at § 

e=-L 

{u,<j>t) = ae, £eZ; (8.36b) 

and 

00 

Ml = E M 2 - ( 8 - 36c ) 

Proof. Define for every positive integer L 

L 

u L = Y, a z<t>z> LeN - ( 8 - 37 ) 

Since, by hypothesis, U is a linear subspace and the signals {4>e} are all in U, it fol- 
lows that ul € hi. By the orthonormality assumption (8.5) and by the Pythagorean 
Theorem (Theorem 4.5.2), it follows that 

||u„-u m |||= ^ \ae\ 

min{m,n}<|£|<max{m,n} 

< 2. \ a i\ ; n,m(£N. 

min{m,n}<|£|<oo 

From this and from the square summability of {ag}, it follows that for any e > 
we have that ||u„ — u m || 2 is smaller than e whenever both n and m are sufficiently 
large. By the completeness of C2 it thus follows that there exists some u' G £g 
such that 

lim llu'-uJL =0. (8.38) 

L— +00 

Since U is closed, and since ul is in U for every L € N, it follows from (8.38) that u' 
is indistinguishable from some element u of hi: 

||u-u'|| 2 =0. (8.39) 

It now follows from (8.38) and (8.39) that 

lim ||u-u L ||g =0, (8.40) 

L — >oo 

as can be verified using (4.14) (with the substitution (u' — uj_) for x and (u — u') 
for y). Combining (8.40) with (8.37) establishes (8.36a). 
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To establish (8.36b) we use (8.40) and the continuity of the inner product (Propo- 
sition 3.4.2) to calculate (u, (pi) for every fixed £ £ Z as follows: 

(u,<pt) = Ihn (u L ,4>e) 

L — *oo 

lim ( ^ at'4>t',4>t 



L — >oo , 
* t 

= lim a e l{\£\ < L} 

L — >oo 
= C(£, £ £ Z, 

where the first equality follows from (8.40) and from the continuity of the inner 
product (Proposition 3.4.2); the second by (8.37); the third by the orthonormality 
(8.5); and the final equality because a^I{|£| < L} is equal to at, whenever L is 
large enough (i.e., exceeds \£\). 

It remains to prove (8.36c). By the orthonormality of {<pi} and the Pythagorean 
Theorem (Theorem 4.5.2) 

L 

|2 V^ I I 2 



J2 M , 1-eN. (8.41) 



ll u LH2 - 

Also, by (4.14) (with the substitution of u for x and of (u[_ — u) for y) we obtain 

||u|| 2 - ||u - u L || 2 < ||u L || s < ||u|| 2 + ||u - u L || 2 . (8.42) 

It now follows from (8.42), (8.40), and the Sandwich Theorem 5 that 

lim ||u L || 2 = ||u|| 2 , (8.43) 

L — >oo 

which combines with (8.41) to prove (8.36c). □ 

By applying Theorem 8.5.3 to the space of energy-limited signals that are band- 
limited to W Hz and to the CONS that we derived for that space in Proposi- 
tion 8.4.2 we obtain: 

Proposition 8.5.4. Any square- summable bi-infinite sequence of complex numbers 
corresponds to the samples at integer multiples of J of an energy-limited signal that 
is bandlimited to 1/(2T) Hz. Here T > is arbitrary. 

Proof. Let . . . ,/3_i,/3o,/3i, ... be a square-summable bi-infinite sequence of com- 
plex numbers, and let W = 1/(2T). We seek a signal u that is an energy- limited 
signal that is bandlimited to W Hz and whose samples are given by u(£T) = fy, 
for every integer £. Since the set of all energy-limited signals that are bandlimited 
to W Hz is a closed linear subspace of C2, and since the sequence {ipe} (given ex- 
plicitly in (8.28) as ipi : 1 1— > v2Wsinc(2W£+.Q) is an orthonormal sequence in that 



5 The Sandwich Theorem states that if the sequences of real number {a n }, {b„} and {c n } are 
such that b n < a n < c n for every n, and if the sequences {b n } and {c n } converge to the same 
limit, then {a n } also converges to that limit. 
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subspace, it follows from Theorem 8.5.3 (with the substitution of ipf for (f>i and of 
/?_^/v 2W for ae) that there exists an energy-limited signal u that is bandlimited 
to W Hz and for which 

<u,^) = - 7 L=j9_/ ) /6l (8.44) 



By Proposition 8.4.2, 



(u,ij> e ) = -;=u{-£T), £eZ, (8.45) 



so by (8.44) and (8.45) 

u{-£l) = /3_ e , ?£%■ □ 

We now give an alternative characterization of a CONS for a closed subspace of £2- 
This result will not be used later in the book. 

Proposition 8.5.5 (Characterization of a CONS for a Closed Subspace). 

(i) If the bi-infinite sequence {4>e} is a CONS for the linear subspace U C C2, 
then an element ofU whose inner product with <f>i is zero for every integer £ 
must have zero energy: 

(<u,0/)=O, l6ZJ => (||u|| g = o), ueU. (8.46) 

(wj If U is a closed subspace of C2 and if the bi-infinite sequence {4>i} satisfies 
(8.4) & (8.5), then Condition (8.46) is equivalent to the condition that {4>e} 
forms a CONS for U. 

Proof. We begin by proving Part (i). By definition, if {4>i} is a CONS for U, then 
(8.6) must hold for every every u € U. Consequently, if for some uGWwe have 
that (u, (pi) is zero for all £ G Z, then the RHS of (8.6) is zero and hence the LHS 
must also be zero, thus showing that u must be of zero energy. 

We next turn to Part (ii) and assume that U is closed and that the bi-infinite 
sequence {<f>e} satisfies (8.4) & (8.5). That the condition that {<$>(.} is a CONS 
implies Condition (8.46) follows from Part (i). It thus remains to show that if 
Condition (8.46) holds, then {4>i\ is a CONS. To prove this we now assume that hi 
is a closed subspace; that {4>e} satisfies (8.4) & (8.5); and that (8.46) holds and 
set out to prove that 

00 
||u|| 2 2 = J2 K U ><M| 2 > «eW. (8.47) 



To establish (8.47) fix some arbitrary u € U. Since U C £ 2 , the fact that u is 
in Li implies that it is of finite energy, which combines with (8.3) to imply that the 
bi-infinite sequence . . . , (u, <p-i) , (u, (fio) , (u, <j>\) , . . . is square summable. Since, 
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by hypothesis, U is closed, this implies, by Theorem 8.5.3 (with the substitution 
of (u, (pg) for at), that there exists some element u£li such that 



lim 

L — >oc 



u- J^ ( U ><M<^ =0; (8.48a) 

e=-L 2 

{u,<f>e) = (u,<t>e), £eZ; (8.48b) 

and 

oo 

||u|||= £ |(u,^>| 2 - (8.48c) 

fc-oo 

By (8.48b) it follows that the element u — u of U satisfies 

(u-u,0 £ ) = o, lez, 

and hence, by Condition (8.46), is of zero energy 

||u-u|| g = 0, (8.49) 

so u and u are indistinguishable and hence 

||u|| 2 = ||u|| 2 . 
This combines with (8.48c) to prove (8.47). □ 

8.6 An Isomorphism 

In this section we collect the results of Theorem 8.4.3 and Proposition 8.5.4 into a 
single theorem about the isomorphism between the space of energy-limited signals 
that are bandlimited to W Hz and the space of square-summable sequences. This 
theorem is at the heart of quantization schemes for bandlimited signals. It demon- 
strates that to describe a bandlimited signal one can use discrete-time processing to 
quantize its samples and one can then map the quantized samples to a bandlimited 
signal. The energy in the error signal corresponding to the difference between the 
original signal and its description is then proportional to the sum of the squared 
differences between the samples of the original signal and the quantized version. 

Theorem 8.6.1 (Bandlimited Signals and Square-Summable Sequences). Let 

T= 1/(2W), where W> 0. 

(i) If u is an energy-limited signal that is bandlimited to W Hz, then the bi- 
infinite sequence 

...,«(--O ) «(O) ) «O0,«(2T) ) ... 

consisting of its samples taken at integer multiples of T is square summable 
and 



tJ2 \u(er)\ 2 = \\u\ 



8.7 Prolate Spheroidal Wave Functions 157 

(ii) More generally, if u and v are energy -limited signals that are bandlimited 
to W Hz, then 

oo 

T Y, u(£T)v*(£T) = (u,v). 

£=-oo 

(Hi) If {at} is a bi-infinite square- summable sequence, then there exists an energy- 
limited signal u that is bandlimited to W Hz such that its samples are given 
by 

u(£T) = a e , £e Z. 

(iv) The mapping that maps every energy -limited signal that is bandlimited to W 
Hz to the square- summable sequence consisting of its samples is linear. 



8.7 Prolate Spheroidal Wave Functions 

The following result, which is due to Slepian and Pollak, will not be used in this 
book; it is included for its sheer beauty. 

Theorem 8.7.1. Let the positive constants T > and W > be given. Then 
there exists a sequence of real functions <pi,<f>2,--- and a corresponding sequence 
of positive numbers Ai > A2 > ■ ■ ■ such that: 



that are bandlimited to W Hz, so, a fortiori, 



(i) The sequence (fri,<p2, . . . forms a CONS for the space of energy-limited signals 

W Hz, so, a fortiori, 

(f>e(t)<j> e ,(t)dt = !{£ = £'}, !/sN. (8.50a) 



(ii) The sequence of scaled and time-windowed functions 0i )W , 02, w, • • • defined at 
every t 6R by 

0/,w(t) = -^=^(*)l{|*|<^}, ^eN (8.50b) 

forms a CONS for the subspace of C2 consisting of all energy-limited signals 
that vanish outside the interval [— T/2,T/2], so, a fortiori, 

T/2 

<f)t(t)4> e ,(t)dt = \ e I{£ = £'}, £,£' eN. (8.50c) 

-T/2 

(Hi) For every i€t, 

f-T/2 



LPF w (t-T)<t> e (T)dT = \i<t>i(t), £eN. (8.50d) 

T/2 

The above functions <j>\, $2, ■ ■ ■ are related to Prolate Spheroidal Wave Functions. 
For a discussion of this connection, a proof of this theorem, and numerous appli- 
cations see (Slepian and Pollak, 1961) and (Slepian, 1976). 
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8.8 Exercises 

Exercise 8.1 (Expansion of a Function). Expand the function t t— » sinc 2 (i/2) as an or- 
thonormal expansion in the functions 

. . . , i i— » sinc(£ + 2), 1 1— * sinc(£ + 1), 1 1— > sinc(t),t \— » sinc(£ — 1), 1 1— > sinc(£ — 2), . . . 

Exercise 8.2 (Inner Product with a Bandlimited Signal). Show that if x is an energy- 
limited signal that is bandlimited to W Hz, and if y £ £g, then 

oo 

<x,y) = T s ]T ^T s )yLPF(^T s ), 

£= — oo 

where ylpf is the result of passing y through an ideal unit-gain lowpass filter of bandwidth 
W Hz, and where T s = 1/(2W). 

Exercise 8.3 (Approximating a Sine by Sines). Find the coefficients {at} that minimize 
the integral 

(sinc(3i/2) - ^ a e sinc(i - £)) dt. 

t=-CG 

What is the value of this integral when the coefficients are chosen as you suggest? 

Exercise 8.4 (Integrability and Summability). Show that if x is an integrable signal that 
is bandlimited to W Hz and if T s = 1/(2VV), then 

OO 

J2 H^)\ < oo- 



Hint: Let h be the IFT of the mapping in (7.15) when we substitute for f c ; 2YV for W; 
and 2W + A for W c , where A > 0. Express x(£T s ) as (x * h)(£T s ); upper-bound the 
convolution integral using Proposition 2.4-1; and use Fubini's Theorem to swap the order 
of summation and integration. 

Exercise 8.5 (Approximating an Integral by a Sum). One often approximates an integral 
by a sum, e.g., 

... — . oo 

x{t)dt^5 J2 X ( £S )- 

l=-oo 

(i) Show that if u is an energy-limited signal that is bandlimited to W Hz, then, for 
every < 5 < 1/(2W), the above approximation is exact when we substitute |w(i)| 2 
for x(t), that is, 

„~_ oo 

\u(t)\ 2 dt = 5 J2 M«)| 2 - 

^=-oo 

(ii) Show that if x is an integrable signal that is bandlimited to W Hz, then, for every 
< 5 < 1/(2W), 



/■CO 

/ x(t)dt = 6 J2 x{ - l5 ">- 

J — OO D ... 
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(iii) Consider the signal u: t i— » sinc(i). Compute ||u|| a using Parseval's Theorem and 
use the result and Part (i) to show that 

El IT 

(2m + 1) 2 ~ y 

Exercise 8.6 (On the Pointwise Sampling Theorem). 

(i) Let the functions g, go, gi, • • • be elements of C2 that are zero outside the interval 
[-W, W]. Show that if ||g - g„\\ 2 -> 0, then for every t £ R 

/OO /"DO 

3n(/)e i2 ^d/= / 5 (/)e i2 ^d/. 
- DO -> — OO 

(ii) Use Part (i) to prove the Pointwise Sampling Theorem for energy-limited signals. 

Exercise 8.7 (Reconstructing from a Finite Number of Samples). Show that there does 
not exist a universal positive integer L such that at t — T/2 

c(t)- J2 x(-£T)3m.c(= + e) <0.1 

e=-L 

for all energy-limited signals x that are bandlimited to 1/(2T) Hz. 

Exercise 8.8 (Inner Product between Passband Signals). Let xpb and ype be energy- 
limited passband signals that are bandlimited to W Hz around the carrier frequency f c . 
Let xbb and yBB be their corresponding baseband representations. Let T = 1/YV. Show 

that 

/ 00 

<x PB ,yPB> = 2TRef ]T *bb (H) i/Sb^T) 

Exercise 8.9 (Closed Subspaces). Let U denote the set of energy-limited signals that 
vanish outside some interval. Thus, u is in U if, and only if, there exist a, b G R (that may 
depend on u) such that u{t) is zero whenever t (£ [a, b]. Show that U is a linear subspace 
of C2 , but that it is not closed. 

Exercise 8.10 (Projection onto an Infinite-Dimensional Subspace). 

(i) Let U C £2 be the set of all elements of £.2 that are zero outside the interval 
[— 1,+1]. Given v £ C%, let w be the signal w: 1 1— * v(t)l{\t\ < 1}. Show that w is 
in U and that v — w is orthogonal to every signal in 14. 

(ii) Let U be the subspace of energy-limited signals that are bandlimited to W Hz. 
Given v £ £.%, define w = v * LPFw- Show that w is in U and that v — w is 
orthogonal to every signal in U. 

Exercise 8.11 (A Maximization Problem). Of all unit-energy real signals that are band- 
limited to W Hz, which one has the largest value at t — 0? What is its value at t — 0? 
Repeat for t — 17. 
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Chapter 9 



Sampling Real Passband Signals 



9.1 Introduction 

In this chapter we present a procedure for representing a real energy-limited pass- 
band signal that is bandlimited to W Hz around a carrier frequency f c using com- 
plex numbers that we accumulate at a rate of W complex numbers per second. 
Alternatively since we can represent every complex number as a pair of real num- 
bers (its real and imaginary parts), we can view our procedure as allowing us to 
represent the signal using real numbers that we accumulate at a rate of 2W real 
numbers per second. Thus we propose to accumulate 



2W real samples per second, 



W complex samples per second. 



Note that the carrier frequency f c plays no role here (provided, of course, that 
f c > W/2): the rate at which we accumulate real numbers to describe the passband 
signal does not depend on f c } 

For real baseband signals this feat is easily accomplished using the Sampling The- 
orem as follows. A real energy-limited baseband signal that is bandlimited to W 
Hz can be reconstructed from its (real) samples that are taken 1/(2W) seconds 
apart (Theorem 8.4.3), so the signal can be reconstructed from real numbers (its 
samples) that are being accumulated at the rate of 2W real samples per second. 

For passband signals we cannot achieve this feat by invoking the Sampling Theorem 
directly. Even though, by Corollary 7.7.3, every energy-limited passband signal xpb 
that is bandlimited to W Hz around the center frequency f c is also an energy-limited 
bandlimited (baseband) signal, we are only guaranteed that xpb be bandlimited 



1 But the carrier frequency f c does play a role in the reconstruction. 
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to / c + W/2 Hz. Consequently, if we were to apply the Sampling Theorem directly 
to xpb we would have to sample xpb every l/(2/ c + W) seconds, i.e., we would 
have to accumulate 2/ c + W real numbers per second, which can be much higher 
than 2W, especially in wireless communications where f c 3> W. 

Instead of applying the Sampling Theorem directly to xpb, the idea is to apply it to 
xpb's baseband representation xbb- Suppose that xpb is a real energy- limited pass- 
band signal that is bandlimited to WHz around the carrier frequency f c . By Theo- 
rem 7.7.12 (vii), it can be represented using its baseband representation xbb, which 
is a complex baseband signal that is bandlimited to W/2 Hz (Theorem 7.7.12 (v)). 
Consequently, by the /la-Sampling Theorem (Theorem 8.4.3), xbb can be described 
by sampling it at a rate of W samples per second. Since the baseband signal is 
complex, its samples are also, in general, complex. Thus, in sampling xbb every 
1/W seconds we are accumulating one complex sample every 1/W seconds. Since 
we can recover xpb from xbb and / c , it follows that, as we wanted, we have found 
a way to describe xpb using complex numbers that are accumulated at a rate of W 
complex numbers per second. 

9.2 Complex Sampling 

Recall from Section 7.7.3 (Theorem 7.7.12) that a real energy-limited passband 
signal xpb that is bandlimited to W Hz around a carrier frequency f c can be 
represented using its baseband representation xbb as 

xp B {t) = 2Re(e i2 "fc t x BB {t)), t e R, (9.1) 

where xbb is given by 

xbb =(*•-> e-^'xpB^)) *LPF Wc , (9-2) 

and where the cutoff frequency W c can be chosen arbitrarily in the range 

W W 

y < W c < 2/ c - -. (9.3) 

The signal xbb is an energy-limited complex baseband signal that is bandlimited 
to W/2 Hz. Being bandlimited to W/2 Hz, it follows from the ZIg-Sampling The- 
orem that xbb can be reconstructed from its samples taken 1/(2 (W/2)) = 1/W 
seconds apart. We denote these samples by 



£ 
" BB, W 

so, by (9.2), 



(w)' eeZ (9 - 4) 



(w) = ( (i " e " 2x/ct:cpB(i)) * LPFw 0(w)' ieZ - (9 - 5) 



These samples are, in general, complex. Their real part corresponds to the samples 
of the in-phase component Re(xBB), which, by (7.41a), is given by 

Re(x BB ) = (in x PB {t)cos{2irf c t)) *LPF Wc (9-6) 
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XpB(t)- 






zpb(£) C0S(27T f c t) 



LPFv 



Re(xBB(t)) 



cos(2-7r/ c i) 



f < W c < 2/ c - f 



90° 



Re(i BB (</W)) 



«/W 



1 -x PB (t) sin(27r/ c t) 

-0 - » 



LPFv 



Im(s BB (f)) y \ Im(a?B B(€/W)) 



Figure 9.1: Sampling of a real passband signal xp B . 



(for W c satisfying (9.3)) and their imaginary part corresponds to the samples of 
the quadrature-component Iiii(xbb), which, by (7.41b), is given by 



Im(x BB ) = -(tn a;pB(t)sin(27r/ c i)) *LPF 



vv, 



(9.7) 



Thus, 



XBB (w) = V *~~* x P^( t ) cos ( 27r M) * LPF w c 
1 1-» x PB (t) sin(27r/ c i)) * LPF 



W 



w, 



f6Z. (9.; 



The procedure of taking a real passband signal xpp, and sampling its baseband 
representation to obtain the samples (9.8) is called complex sampling. It is 
depicted in Figure 9.1. The passband signal xpb is first separately multiplied 
by t i— > cos(27r f c t) and by t <— > — sin(27r/ c i), which are generated using a local 
oscillator and a 90°-phase shifter. Each result is fed to a lowpass filter with cutoff 
frequency W c to produce the in-phase and quadrature component respectively. 
Each component is then sampled at a rate of W real samples per second. 



9.3 Reconstructing x PB from its Complex Samples 



By the Pointwise Sampling Theorem (Theorem 8.4.5) applied to the energy-limited 
signal xbb (which is bandlimited to W/2 Hz) we obtain 



Zbb(*) = Yl a; BB(r^)sinc(Wi-£), te 



(9.9) 
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Consequently, by (9.1), xpb can be reconstructed from its complex samples as 

(CO p \ 

e i2nf c t ^ x BB (rr,) sinc(Wi -£)), (61. (9.10a) 

Since the sine (•) function is real, this can also be written as 

x PB (t) = 2 Y^ Re ( e' 2vf ^x BB (^—Usmc(Wt-£), (61, (9.10b) 

or, using real operations, as 

x PB {t) = 2 Y, Re(x BB (—Usmc{Wt-£)cos{2irf c t) 

-2 ^ lm \ XBB {w) )sinc(W(-£)sin(27r/ c (), (el. (9.10c) 

As we next show, we can obtain another form of convergence using the /la-Sampling 
Theorem (Theorem 8.4.3). We first note that by that theorem 



lim 

L — *oo 



I- 



1 1-> x B b(() - Y Xbb \w) sinc ( Wf 






0. (9.11) 



2 



We next note that xbb is the baseband representation of xpb and that — as can be 
verified directly or by using Proposition 7.7.9 — the mapping 

( i-* x BB {£/W) sinc(W( - I) 

is the baseband representation of the real passband signal 

( >-> 2Re[e i2 *iB B (^) sinc(W( - £) j . 

Consequently, by linearity (Theorem 7.7.12 (ii)), the mapping 

t i-» ccbb(() - Y Xbb ( w) sinc ( Wi - 2) 

is the baseband representation of the real passband signal 

/ L £ 

t h-» x PB (() - 2Re(e i2 ^« ^ ^bb(^) sinc(Wi - £) 

and hence, by Theorem 7.7.12 (iii) 

t h-> x PB (t) - 2Re(e i2 ^ f ]T x bb(^-) sinc(W( - £) 






£ 



= -L 



L 



( i-> z BB (() - XI XBB ( w) sinc ( Wt ~ £ ) 






2 

. (9.12) 

2 
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Combining (9.11) with (9.12) yields the C2 convergence 

L 



lim 

L — >oc 



t \— > xp B (t) — 2 Re e 



5 i27T/ c t 






■'liislry) wnc(VV/ - < 1 



(9.13) 



We summarize how a passband signal can be reconstructed from the samples of its 
baseband representation in the following theorem. 

Theorem 9.3.1 (The Sampling Theorem for Passband Signals). Let xpb be a 

real energy-limited passband signal that is bandlimited to W Hz around the carrier 
frequency f c . For every integer £, let x BB ((. /W) denote the time-£/W sample of the 
baseband representation xbb o/xpb; see (9.5) and (9.8). 

(i) xpb can be pointwise reconstructed from the samples using the relation 
x PB (t) = 2 Re ( e i2 ^* ]T x BB (— ) sinc(Wi - €} ] . / e 



(ii) xpb can aZso fee reconstructed from the samples in the C2 sense 

lim / (xp B (t)-2Re(e i2 ^* ^ x BB (r^)sinc(Wi-£) j ] At = 0. 



fm) TTie energy in xpb can fee reconstructed from the sum of the squared magni- 
tudes of the samples via 



|x PB ||| - - 



E h BB (w) 



1" : 



(iv) If ypB is another real energy-limited passband signal that is bandlimited to 
W Hz around f c , and if {y BB {£/W)} are the samples of its baseband repre- 
sentation, then 



(x PB ,ypB> = ^Rej Y, 2; BBf^ r/ )j/BB / 



w 



w 



w 



Proof. Part (i) is just a restatement of (9.10b). Part (ii) is a restatement of (9.13). 
Part (iii) is a special case of Part (iv) corresponding to ypB being equal to xpb- It 
thus only remains to prove Part (iv). This is done by noting that if xbb and ysB 
are the baseband representations of xpb and ypB, then, by Theorem 7.7.12 (iv), 



(x PB ,ypB> = 2Re((x BB ,yBB)) 

= 4 Re ( £ " BB (4 



where the second equality follows from Theorem 8.4.3 (iii). 



□ 
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Using the isomorphism between the family of complex square-summable sequences 
and the family of energy-limited signals that are bandlimited to W Hz (Theo- 
rem 8.6.1), and using the relationship between real energy-limited passband signals 
and their baseband representation (Theorem 7.7.12), we can readily establish the 
following isomorphism between the family of complex square-summable sequences 
and the family of real energy-limited passband signals. 

Theorem 9.3.2 (Real Passband Signals and Square-Summable Sequences). Let 

f c , W, and T be constants satisfying 

/ c >W/2>0, T=l/W. 

(i) If xpb is a real energy-limited passband signal that is bandlimited to W Hz 
around f c , and if xbb is its baseband representation, then the bi-infinite se- 
quence consisting of the samples o/xbb at integer multiples ofl 

• ■ • , xbb(-T), ccbb(O), xbb(T), :ebb(2T), . . . 

is a square-summable sequence of complex numbers and 

oo 

2i Y, |z B B(n)| 2 = ||xpB|| 2 2 . 

fc-oo 

(ii) More generally, i/xps and yp B are real energy-limited passband signals that 
are bandlimited to W Hz around the carrier frequency f c , and if xbb and 
Ybb are their baseband representations, then 



2TRe( J2 XBB{eT)y* BB {tT))=(x PB ,y PB 



(Hi) If . . . ,a_i,ao, oji, ■ • ■ is a square-summable bi-infinite sequence of complex 
numbers, then there exists a real energy-limited passband signal xpb that is 
bandlimited to W Hz around the carrier frequency f c such that the samples 
of its baseband representation xbb are given by 

^bb(^T) = at, /gZ. 

(iv) The mapping of every real energy-limited passband signal that is bandlimited 
to W Hz around f c to the square-summable sequence consisting of the samples 
of its baseband representation is linear (over M.). 



9.4 Exercises 

Exercise 9.1 (A Specific Signal). Let x be a real energy-limited passband signal that 
is bandlimited to W Hz around the carrier frequency f c . Suppose that all its complex 
samples are zero except for its zero-th complex sample, which is given by 1 + i. What 
is x? 



9.4 Exercises 167 



Exercise 9.2 (Real Passband Signals whose Complex Samples Are Real). Characterize 
the Fourier Transforms of real energy-limited passband signals that are bandlimited to W 
Hz around the carrier frequency f c and whose complex samples are real. 

Exercise 9.3 (Multiplying by a Carrier). Let x be a real energy-limited signal that is 
bandlimited to W/2 Hz, and let f c be larger than W/2. Express the complex samples of 
t i— » x(t) cos(2nf c t) in terms of x. Repeat for t i— » x(t) sin(2nf c t). 

Exercise 9.4 (Naively Sampling a Passband Signal). 

(i) Consider the signal x: 1 1— > m(t) sin(27r/ c i), where m(-) is an integrable signal that 
is bandlimited to 100 Hz and where f c — 100 MHz. Can x be recovered from its 
samples . . . , x(-T), x(0),x(T), . . . when 1/T = 100 MHz? 

(ii) Consider now the general case where x is an integrable real passband signal that is 
bandlimited to W Hz around the carrier frequency f c . Find conditions guaranteeing 
that x be reconstructible from its samples . . . , x(— T), x(0), x(T), . . . 

Exercise 9.5 (Orthogonal Passband Signals). Let xpb and ypB be real energy-limited 
passband signals that are bandlimited to W Hz around the carrier frequency / c . Under 
what conditions on their complex samples are they orthogonal? 

Exercise 9.6 (Sampling a Baseband Signal As Though It Were a Passband Signal). Recall 
that, ignoring some technicalities, a real baseband signal x of bandwidth W Hz can be 
viewed as a real passband signal of bandwidth W around the carrier frequency / c , where 
f c — W/2 (Problem 7.3). Compare the reconstruction formula for x from its samples to 
the reconstruction formula for x from its complex samples. 

Exercise 9.7 (Multiplying the Complex Samples). Let x be a real energy-limited passband 
signal that is bandlimited to W Hz around the carrier frequency f c . Let . . . , X-\ , Xo, Xi, . . . 
denote its complex samples taken 1/W second apart. Let y be a real energy-limited 
passband signal that is bandlimited to W Hz around the carrier frequency f c and whose 
complex samples are like those of x but multiplied by i. Relate the FT of y to the FT 
of x. 

Exercise 9.8 (Delayed Complex Sampling). Let x and y be real energy-limited passband 
signals that are bandlimited to W Hz around the carrier frequency / c . Suppose that the 
complex samples of y are the same as those of x, but delayed by one: 

/ I \ ( i — 1\ 

yBB {w) =XBB \~wr)> eeZ - 

How are x and y related? Is y a delayed version of x 



? 



Exercise 9.9 (On the Family of Real Passband Signals). Is the set of all real energy- 
limited passband signals that are bandlimited to W Hz around the carrier frequency f c 
a linear subspace of the set of all complex energy-limited signals? 

Exercise 9.10 (Complex Sampling and Inner Products). Show that the ^-th complex 
sample xbb(^/W) of any real energy- limited passband signal that is bandlimited to W 
Hz around the carrier frequency / c can be expressed as an inner product 

Xbb \w) = ^ X '^ ' l G Z ' 

where . . . , <f>-i, 4>o, 4>i, ■ ■ ■ are orthogonal equi-energy complex signals. Is 4>i m general 
a delayed version of tfio? 



168 Sampling Real Passband Signals 

Exercise 9.11 (Absolute Summability of the Complex Samples). Show that the complex 
samples of a real integrable passband signal that is bandlimited to W Hz around the 
carrier frequency f c must be absolutely summable. 

Hint: See Exercise 8.4- 

Exercise 9.12 (The Convolution Revisited). Let x and y be real integrable passband 
signals that are bandlimited to W Hz around the carrier frequency f c . Express the 
complex samples of x * y in terms of those of x and y. 

Exercise 9.13 (Complex Sampling and Filtering). Let x be a real integrable passband 
signal that is bandlimited to W Hz around the carrier frequency / c , and let h be the 
impulse response of a real stable filter. Relate the complex samples of x*h to those of x 
and h * BPFw,/ c • 



Chapter 10 

Mapping Bits to Waveforms 

10.1 What Is Modulation? 

Data bits are mathematical entities that have no physical attributes. To send them 
over a channel, one needs to first map them into some physical signal, which is 
then "fed" into a channel to produce a physical signal at the channel's output. For 
example, when we send data over a telephone line, the data bits are first converted 
to an electrical signal, which then influences the voltage measured at the other 
end of the line. (We use the term "influences" because the signal measured at the 
other end of the line is usually not identical to the channel input: it is typically 
attenuated and also corrupted by thermal noise and other distortions introduced 
by various conversions in the telephone exchange system.) Similarly, in a wireless 
system, the data bits are mapped to an electromagnetic wave that then influences 
the electromagnetic field measured at the receiver antenna. In magnetic recording, 
data bits are written onto a magnetic medium by a mapping that maps them to 
a magnetization pattern, which is then measured (with some distortion and some 
noise) by the magnetic head at some later time when the data are read. 

In the first example the bits are mapped to continuous-time waveforms correspond- 
ing to the voltage across an impedance, whereas in the last example the bits are 
mapped to a spatial waveform corresponding to different magnetizations at dif- 
ferent locations across the magnetic medium. While some of the theory we shall 
develop holds for both cases, we shall focus here mainly on channels of the former 
type, where the channel input signal is some function of time rather than space. 

We shall further focus on cases where the channel input corresponds to a time- 
varying voltage across a resistor, a time-varying current through a resistor, or a 
time-varying electric field, so the energy required to transmit the signal is propor- 
tional to the time integral of its square. Thus, if x{t) denotes the channel input at 
time t, then we shall refer to J. a; 2 (r)dr as the transmitted energy during the 
time interval beginning at time t and ending at time t + A. 

There are many mappings of bits to waveforms, and our goal is to find "good" ones. 
We will, of course, have to define some figures of merit to compare the quality of 
different mappings. We shall refer to the mapping of bits to a physical waveform 
as modulation and to the part of the system that performs the modulation as the 
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modulator. 

Without going into too much detail, we can list a few qualitative requirements of a 
modulator. The modulation should be robust with respect to channel impairments, 
so that the receiver at the other end of the channel can reliably decode the data bits 
from the channel output. Also, the modulator should have reasonable complexity. 
Finally, in many applications we require that the transmitted signal be of limited 
power so as to preserve the battery. In wireless applications the transmitted signal 
may also be subject to spectral restrictions so as to not interfere with other systems. 

10.2 Modulating One Bit 

One does not typically expect to design a communication system in order to convey 
only one data bit. The purpose of the modulator is typically to map an entire bit 
stream to a waveform that extends over the entire life of the communication system. 
Nevertheless, for pedagogic reasons, it is good to first consider the simplest scenario 
of modulating a single bit. In this case the modulator is fully characterized by two 
functions Xq(-) and Xi(-) with the understanding that if the data bit D is equal 
to zero, then the modulator produces the waveform xq(-) and that otherwise it 
produces cci(-). Thus, the signal produced by the modulator is given by 




X(t)= <J v )^ . r ^ / tel. (10.1) 

For example, we could choose 

/x (Ae-*/ 7 ift/TX), 
x (t) = { ' ~ , t G 

otherwise, 



and 



JA ifO<i/T<l, 
I otherwise, 



where T= 1 sec and where A is a constant such that A has units of power. 

This may seem like an odd way of writing these waveforms, but we have our 
reasons: we typically think of t as having units of time, and we try to avoid 
applying transcendental functions (such as the exponential function) to quantities 
with units. Also, we think of the squared transmitted waveform as having units 
of power, whereas we think of the transcendental functions as returning unit- less 
arguments. Hence the introduction of the constant A with the understanding that 
A has units of power. 

We denoted the bit to be sent by an uppercase letter (D) because we like to de- 
note random quantities (such as random variables, random vectors, and stochastic 
processes) by uppercase letters, and we think of the transmitted bit as a random 
quantity. Indeed, if the transmitted bit were deterministic, there would be no 
need to transmit it! This may seem like a statement made in jest, but it is ac- 
tually very important. In the first half of the twentieth century, engineers often 
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analyzed the performance of (analog) communication systems by analyzing their 
performance in transmitting some particular signal, e.g., a sine wave. Nobody, of 
course, transmitted such "boring" signals, because those could always be produced 
at the receiver using a local oscillator. In the second half of the twentieth century, 
especially following the work of Claude Shannon, engineers realized that it is only 
meaningful to view the data to be transmitted as random, i.e., as quantities that 
are unknown at the receiver and also unknown to the system designer prior to the 
system's deployment. We thus view the bit to be sent D as a random variable. 
Often we will assume that it takes on the values and 1 equiprobably. This is a 
good assumption if prior to transmission a data compression algorithm is used. 

By the same token, we view the transmitted signal as a random quantity, and 
hence the uppercase X. In fact, if we employ the above signaling scheme, then at 
every time instant t' € M. the value X(t') of the transmitted waveform is a random 
variable. For example, at time T/2 the value of the transmitted waveform is X(T/2), 
which is a random variable that takes on the values Ae~ 1 ' 2 and A equiprobably. 
Similarly, at time 2T the value of the transmitted waveform is X(2T), which is a 
random variable taking on the values e~ 2 and equiprobably. Mathematicians call 
such a waveform a random process or a stochastic process (SP). This will be 
defined formally in Section 12.2. 

It is useful to think about a random process as a function of two arguments: time 
and "luck" or, more precisely, as a function of time and the result of all the random 
experiments in the system. For a fixed instant of time t G K, we have that X(t) 
is a random variable, i.e., a real- valued function of the randomness in the system 
(in this case the realization of D). Alternatively, for a fixed realization of the 
randomness in the system, the random process is a deterministic function of time. 
These two views will be used interchangeably in this book. 



10.3 From Bits to Real Numbers 

Many of the popular modulation schemes can be viewed as operating in two stages. 
In the first stage the data bits are mapped to real numbers, and in the second stage 
the real numbers are mapped to a continuous-time waveform. If we denote by k the 
number of data bits that will be transmitted by the system during its lifetime (or 
from the moment it is turned on until it is turned off), and if we denote the data 
bits by D 1 , D 2 , ■ ■ ■ , D^, then the first stage can be described as the application of 
a mapping (/?(•) that maps length-fc sequences of bits to length-n sequences of real 
numbers: 

<p: {0,l} fe ^K™ 

(di,...,dk) i-> (xi,...,x n ). 

From an engineering point of view, it makes little sense to allow for the encoding 
function to map two different binary fc-tuples to the same real n-tuple, because 
this would result in the transmitted waveforms corresponding to the two fc-tuples 
being identical. This may cause errors even in the absence of noise. We shall 
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therefore assume throughout that the mapping ip(-) is one-to-one (injective) so 
no two distinct data fc-tuples are mapped to the same n-tuple of real numbers. 

An example of a mapping that maps bits to real numbers is the mapping that maps 
each data bit Dj to the real number Xj according to the rule 

i = l,...,k. (10.2) 

In this example one real symbol Xj is produced for every data bit, so n = k. For 
this reason we say that this mapping has the rate of one bit per real symbol. 

As another example consider the case where k is even and the data bits {Dj} are 
broken into pairs 

(D u D 2 ),(D 3 ,D i ),...,(D k _ 1 ,D k ) 

and each pair of data bits is then mapped to a single real number according to the 
rule 





+1 if Di = 


Xj = 


{ 




1-1 HDj = l 



(D 2 





r+3 


if D 2j -i = D 2j = 0, 


2j-l,D 2j ) H-> < 


+i 

-3 


if D 2 j-i = and D 2 j = 1 
if D 2 j-i = D 2o = 1, 




k -l 


if D 2 j—i = 1 and D 2 j = 




j = l,...,k/2. (10.3) 



In this case n = k/2, and we say that the mapping has the rate of two bits per real 
symbol. 

Note that the rate of the mapping could also be a fraction. Indeed, if each data 
bit Dj produces two real numbers according to the repetition law 

if Dj = 0, , 

' j = l,...,k, (10.4) 

then n = 2k, and we say that the mapping is of rate half a bit per real symbol. 

Since there is a natural correspondence between R 2 and C, i.e., between pairs of real 
numbers and complex numbers (where a pair of real numbers (x, y) corresponds 
to the complex number x + \y), the rate of the above mapping (10.4) can also be 
stated as one bit per complex symbol. This may seem like an odd way of stating the 
rate, but it has some advantages that will become apparent later when we discuss 
the mapping of real (or complex) numbers to waveforms and the Nyquist Criterion. 

10.4 Block-Mode Mapping of Bits to Real Numbers 

The examples we gave in Section 10.3 of mappings ip: {0, l} fc — > K™ have something 
in common. In each of those examples the mapping can be described as follows: the 
data bits D±, . . . ,D k are first grouped into binary K-tuples; each K-tuple is then 
mapped to a real N-tuple by applying some mapping enc: {0, 1} K — > M N ; and the 
so-produced real N-tuples are then concatenated to form the sequence X\, . . . , X n , 
where n = (fc/K)N. 
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D U D 2 , ■■• ,D K , D K+1 , ■■■ ,D 2K 
enc(-) enc(-) 

X\,X 2 , ■■■ , An, Xn + i, ••• , X 2 \ 





Dk-K+1 

\ 


5 

enc(-) 


D k 




,x. 


T 

i-N+1, 




> 


X n 


\ 








/ 



enc(Di,. --,-Dk) enc(D K + i, ■ ■ ■ , D 2K ) enc(D k - K + i, ■ ■ ■ , D k ) 

Figure 10.1: Block-mode encoding. 

In the first example K = N = 1 and the mapping of K-tuples to N-tuples is the 
mapping (10.2). In the second example K = 2 and N = 1 with the mapping (10.3). 
And in the third example K = 1 and N = 2 with the repetition mapping (10.4). 

To describe such mappings ip: {0, l} fc — > R™ more formally we need the notion of 
a binary-to- reals block encoder, which we define next. 

Definition 10.4.1 ((K,N) Binary-to-Reals Block Encoder). A (K,N) binary-to- 
reals block encoder is a one-to-one mapping from the set of binary K-tuples to 
the set of real N-tuples, where K and 1M are positive integers. The rate of a (K, N) 
binary-to-reals block encoder is defined as 



K 

N 



bit 



real symbol 



Note that we shall sometimes omit the phrase "binary-to-reals" and refer to such 
an encoder as a (K, N) block encoder. Also note that "one-to-one" means that 
no two distinct binary K-tuples may be mapped to the same real N-tuple. 

We say that an encoder cp: {0, l} fc — » E™ operates in block-mode using the 
(K,N) binary-to-reals block encoder enc(-) if 

1) k is divisible by K; 

2) n is given by (fc/K) N; and 

3) ip(-) maps the binary sequence D\, . . . , D k to the sequence X\, . . . , X n by 
parsing the sequence D\, . . . , D k into consecutive length-K binary tuples and 
by then concatenating the results of applying enc(-) to each such K-tuple as 
in Figure 10.1. 

If k is not divisible by K, we often introduce zero padding. In this case we 
choose k' to be the smallest integer that is no smaller than k and that is divisible 
by K, i.e., 

k 



k' 



K 



K, 



(where for every £ € R we use [£] to denote the smallest integer that is no smaller 
than £, e.g., [1.24] = 2) and map D\, . . . , D k to the sequence X\, . . . , X n * where 

y 

n' = -N 

K 
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Dl,D 2 , ■■■ ,D K , Dk + 1, ••• ,£>2K, i -Dfe'-K + l: • • • ! D k , 0, . . . , 

Ienc(-) enc(-) enc(-) 

1 • • • 1 

X\,X 2 , ■■■ ,Xn,Xn+1, ••• ,^2N, i^"n'-N+l> ■"■ ,X n i 



enc(Z?i, . . . ,.Dk) enc(D K + i, . . . , D 2 k) enc(D k -K + i, ■ ■ ■ , D k , 0, . . . , 0) 

Figure 10.2: Block-mode encoding with zero padding. 



by applying the (K, N) encoder in block-mode to the fc'-length zero-padded binary 
tuple 

£>i,..., D fc , 0,...,0 (10.5) 

k' — k zeros 

as in Figure 10.2. 

10.5 From Real Numbers to Waveforms with Linear Modulation 

There are numerous ways to map a sequence of real numbers X\ , . . . , X n to a real- 
valued signal. Here we shall focus on mappings that have a linear structure. This 
additional structure simplifies the implementation of the modulator and demodu- 
lator. It will be described next. 

Suppose we wish to modulate the k data bits D\, . . . , D k , and suppose that we 
have mapped these bits to the n real numbers X\, . . . , X n . Here n can be smaller, 
equal, or greater than k. The transmitted waveform X(-) in a linear modulation 
scheme is then given by 

n 

X(t) = AY,Xe9e{t), teR, (10.6) 

where the deterministic real waveforms gi,...,g„ are specified in advance, and 
where A > is a scaling factor. The waveform X(-) can be thus viewed as a scaled- 
by-A linear combination of the tuple (gi, . . . , g„) with the coefficients X\, . . . , X n : 

n 

X = A^X,g,. (10.7) 



The transmitted energy is a random variable that is given by 

/CO 
X 2 (t)dt 
-co 

/co / n \ 2 

(AY,Xe9dt)) d« 
-CO \ /;_-, / 
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)dt 



i=\t>=\ 



The transmitted energy takes on a particularly simple form if the waveforms gt(-) 
are orthonormal, i.e., if 

{&,&>) =!{( = *}, (,£'€{l,...,n}, (10.8) 

in which case the energy is given by 

n 

\\X\\l = A 2 ^XJ, {g e } orthonormal. (10.9) 



As an exercise, the reader is encouraged to verify that there is no loss in generality 
in assuming that the waveforms {g^} are orthonormal. More precisely: 

Theorem 10.5.1. Suppose that the waveform X(-) is generated from the binary 
k-tuple D\, . . . , Dk by applying the mapping cp: {0, l} fc — > R n and by then linearly 
modulating the resulting n-tuple <p(Di, . . . , Dk) using the waveforms {g^}™ =1 as in 
(10.6). 

Then there exist an integer 1 < n' < n; a mapping (p' : {0, l} fe — » 1" ; and n' 
orthonormal signals {</>f}™ =1 such that if X'(-) is generated from D\, . . . ,Dk by 
applying linear modulation to ip'(D±, . . . , Dk) using the orthonormal waveforms 
{4>e}g = i, then X'{-) and X(-) are indistinguishable for every k-tuple D±, . . . , Dk- 

Proof. The proof of this theorem is left as an exercise. □ 

Motivated by this theorem, we shall focus on linear modulation with orthonormal 
functions. But please note that even if the transmitted waveform satisfies (10.8), 
the received waveform might not. For example, the channel might consist of a 
linear filter that could destroy the orthogonality. 



10.6 Recovering the Signal Coefficients with a Matched Filter 

Suppose now that the binary fc-tuple (D±, . . . ,-Dfe) is mapped to the real n-tuple 
(Xi, . . . , X n ) using the mapping 

if: {Q,l} k ^m. n (10.10) 

and that the n-tuple (X\, . . . , X n ) is then mapped to the waveform 

n 

x{t) = A^2x l 4> l {t), tel, (io.il) 
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where (pi, . . . , (f> n are orthonormal: 

(<t>t,<t>t>) =!{( = ?}, t,tre{l,...,n}. (10.12) 

How can we recover the £;-tuple D\, . . . ,D/- from X(-)7 The decoder's problem 
is, of course, harder, because the decoder usually does not have access to the 
transmitted waveform X(-) but only to the received waveform, which may be a 
noisy and distorted version of X(-). Nevertheless, it is instructive to consider the 
noiseless and distortionless problem first. 

If we are able to recover the real numbers {-X/}"_i from the received signal X(-), 
and if the mapping ip: {0, l} fe — > K n is one-to-one (as we assume), then the data 
bits {Dj} l j =1 can be reconstructed from X(-). Thus, the question is how to recover 
{Xi}™ =1 from X(-). But this is easy if the functions {0f}™ =1 are orthonormal, 
because in this case, by Proposition 4.6.4 (i), Xi is given by the scaled inner 
product between X and (j)f. 

X e =-^(X,<p e ), i=l,...,n. (10.13) 

Consequently, we can compute Xg by feeding X to a matched filter for <pg and 
scaling the time-0 output by 1/A (Section 5.8). To recover {-2Q}™ =1 we thus need n 
matched filters, one matched to each of the waveforms {4>e}- 

The implementation becomes much simpler if the functions {4>e} have an additional 
structure, namely, if they are all time shifts of some function 4>(-): 

<f>e(t) = <f>(t-n a ), (ie{l,...,n}, tel). (10.14) 

In this case it follows from Corollary 5.8.3 that we can compute all the inner 
products {(X,0f)} using one matched filter of impulse response cf> by feeding X 
to the filter and sampling its output at the appropriate times: 

X(T)Mr)dT 

X(T)<f>(T-n s )dT 

X(t)$(1\-t)&t 

X*$){£T S ), £=l,...,n. (10.15) 

Figure 10.3 demonstrates how the symbols {Xg\ can be recovered from X(-) using 
a single matched filter if the pulses {<pe} satisfy (10.14). 

10.7 Pulse Amplitude Modulation 

Under Assumption (10.14), the transmitted signal X(-) in (10.11) is given by 

n 

X(t) = A^2x e <P{t-£T s ), teR, (10.16) 



1 


y-DC 


£ = x. 


J — CX) 


1 


f-OC 


~ A. 


J — CO 


1 


f-OC 


~ A. 


J — oo 


4' 


[x* 



10.8 Constellations 177 



X{- 



4> 



>AI ( 

£J S 



Figure 10.3: Recovering the symbols from the transmitted waveform using a 
matched filter when (10.14) is satisfied. 



which is a special case of Pulse Amplitude Modulation (PAM), which we 
describe next. 

In PAM, the data bits D±, . . . , Dk are mapped to real numbers Xi, . . . , X n , which 
are then mapped to the waveform 

n 

X{t) = Aj2 x e9(t-iT s ), teR, (10.17) 

i=\ 

for some scaling factor A > 0, some function g: K — > R, and some constant T s > 0. 
The function g (always assumed Borel measurable) is called the pulse shape; the 
constant T s is called the baud period; and its reciprocal 1/T S is called the baud 
rate. 1 The units of T s are seconds, and one often refers to the units of 1/T S as real 
symbols per second. PAM can thus be viewed as a special case of linear modulation 
(10.6) with gi being given for every £ G {1, . . . , n} by the mapping t \— > g(t — iT s ). 
The signal (10.16) can be viewed as a PAM signal where the pulse shape cj} satisfies 
the orthonormality condition (10.14). 

In this book we shall typically denote the PAM pulse shape by g. But we shall 
use 4> if we assume an additional orthonormality condition such as (10.12). In this 
case we shall refer to 1/T S as having units of real dimensions per second: 



1 
T 



real dimension 



^satisfies (10.12). (10.18) 



Note that according to Theorem 10.5.1 there is no loss in generality in assuming 
that the pulses {4>i} are orthonormal. There is, however, a loss in generality in 
assuming that they satisfy (10.14). 



10.8 Constellations 

Recall that in PAM the data bits D\, . . . , Dk are first mapped to the real n-tuple 
Xi, . . . , X n using a one-to-one mapping ip: {0, l} fe — > K n , and that these real 
numbers are then mapped to the waveform X(-) via (10.17). Since there are only 
2 k different binary /c-tuples, it follows that each symbol JQ can take on at most 
2 k different values. The set of values that Xg can take on may, in general, depend 
on £. The union of all these sets (over I g {1, . . . , n}) is called the constellation of 



1 These terms honor the French engineer J.M.E. Baudot (1845-1903) who invented a telegraph 
printing system. 
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the mapping </?(■). Denoting the constellation of ip(-) by X, we thus have that a real 
number x is in X if, and only if, for some choice of the binary fc-tuple (d\, . . . , dk) 
and for some £ € {1, . . . , n} the f-th component of tp((di, ■ ■ ■ , dk)) is equal to x. 

For example, the constellation corresponding to the mapping (10.2) is the set 
{ — 1,+1}; the constellation corresponding to (10.3) is the set {— 3, — 1, +1, +3}; 
and the constellation corresponding to (10.4) is the set { — 1,+1}. In all these 
examples, the constellation can be viewed as a special case of the constellation 
with 2v symbols 

{-{2v - 1), ... , -5, -3, -1, +1, +3, +5, . . . ,+(2i/ - 1)} (10.19) 

for some positive integer v. A less prevalent constellation is the constellation 

{-2,-l,+l,+2}. (10.20) 

The number of points in the constellation X is just #X, i.e., the number of 
elements (cardinality) of the set X. 

The minimum distance 5 of a constellation is the Euclidean distance between 
the closest distinct elements in the constellation: 

6= min \x - x'\. (10.21) 

x^x' 

The scaling of the constellation is arbitrary because of the scaling factor A in the 
signal's description. Thus, the signal A^^Xi g(t — £T S ), where Xn takes value in 
the set {±1} is of constellation { — 1,+1}, but it can also be expressed in the form 
A / Y,e X 'e9( t ~ ^s), where A' = 2A and X' e takes value in the set {-1/2, +1/2}, 
i.e., as a PAM signal of constellation { — 1/2, +1/2}. 

Different authors choose to normalize the constellation in different ways. One 
common normalization is to express the elements of the constellation as multiples 
of the minimum distance. Thus, we would represent the constellation { — 1,+1} as 

--S, +-S 
2 2 

and the constellation {—3, — 1, +1, +3} as 

f 3 r 1 1 3 r 

<^ --5 , --S, +-6, +-5 
\ 2 ' 2 2 ' 2 

The normalized version of the constellation (10.19) is 

±*LI±6,...,±\6, ±\S, ±\s\. (10.22) 

The second moment of a constellation X is defined as 



^E^ 2 - dO-23) 



xe x 
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The second moment of the constellation in (10.22) is given by 

,<5 2 



^2 < 



" xex »7=i 



where 



I(M 2 -l)^, (10.24a) 



M. = 2v (10.24b) 



is the number of points in the constellation, and where (10.24a)-(10.24b) can be 
verified using the identity 

" 1 

^(2ry-l) 2 =-^(4^ 2 -l), i/ = l,2,... (10.25) 

„=i 6 



10.9 Design Considerations 

Designing a communication system employing PAM with a block encoder entails 
making choices. We need to choose the PAM parameters A, T s , and g, and we 
need to choose a (K, N) block encoder enc(-). These choices greatly influence the 
overall system characteristics such as the transmitted power, bandwidth, and the 
performance of the system in the presence of noise. To design a system well, we 
must understand the effect of the design choices on the overall system at three 
levels. At the first level we must understand which design parameters influence 
which overall system characteristics. At the second level we must understand 
how the design parameters influence the system. And at the third level we must 
understand how to choose the design parameters so as to optimize the system 
characteristics subject to the given constraints. 

In this book we focus on the first two levels. The third requires tools from Infor- 
mation Theory and from Coding Theory that are beyond the scope of this book. 
Here we offer a preview of the first level. We thus briefly and informally explain 
which design choices influence which overall system properties. 

To simplify the preview, we shall assume in this section that the time shifts of the 
pulse shape by integer multiples of the baud period are orthonormal. Consequently, 
we shall denote the pulse shape by <j> and assume that (10.12) holds. We shall also 
assume that k and n tend to infinity as in the bi-infinite block mode discussed in 
Section 14.5.2. Roughly speaking this assumption is tantamount to the assumption 
that the system has been running since time — oo and that it will continue running 
until time +oo. 

Our discussion is extremely informal, and we apologize to the reader for discussing 
concepts that we have not yet defined. Readers who are aggravated by this practice 
may choose to skip this section; the issues will be revisited in Chapter 29 after 
everything has been defined and all the claims proved. 

The key observation we wish to highlight is that, to a great extent, 
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the choice of the block encoder enc(-) can be decoupled from the 
choice of the pulse shape. The bandwidth and power spectral 
density depend hardly at all on enc(-) and very much on the pulse 
shape, whereas the probability of error on the white Gaussian noise 
channel depends very much on enc(-) and not at all on the pulse 
shape 4>. 



This observation greatly simplifies the design problem because it means that, rather 
than optimizing over and enc(-) jointly, we can choose each of them separately. 

We next briefly discuss the different overall system characteristics and which design 
choices influence them. 



Data Rate: The data rate Rb that the system supports is determined by the baud 
period T s and by the rate K/N of the encoder. It is given by 



R - 1 K 



bit 



sec 



Power: The transmitted power does not depend on the pulse shape <p (Theo- 
rem 14.5.2). It is determined by the amplitude A, the baud period T s , and by 
the block encoder enc(-). In fact, if the block encoder enc(-) is such that when it 
is fed the data bits it produces zero-mean symbols that are uniformly distributed 
over the constellation, then the transmitted power is determined by A, T s , and the 
second moment of the constellation only. 

Power Spectral Density: If the block encoder enc(-) is such that when it is fed 
the data bits it produces zero- mean and uncorrelated symbols of equal variance, 
then the power spectral density is determined by A, T s , and <f> only; it is unaffected 
by enc(-) (Section 15.4). 



Bandwidth: The bandwidth of the transmitted waveform is equal to the band- 
width of the pulse shape <p (Theorem 15.4.1). We will see in Chapter 11 that 
for the orthonormality (10.12) to hold, the bandwidth W of the pulse shape must 

satisfy 

w> — . 

- 2T S 

In Chapter 11 we shall also see how to design <j> so as to satisfy (10.12) and so as 
to have its bandwidth as close as we wish to 1/(2T S ). 2 



Probability of Error: It is a remarkable fact that the pulse shape <p does not affect 
the performance of the system on the additive white Gaussian noise channel. Per- 
formance is determined only by A, T s , and the block encoder enc(-) (Section 26.5.2). 



"Information-theoretic considerations suggest that this is a good approach. 
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The preceding discussion focused on PAM, but many of the results also hold for 
Quadrature Amplitude Modulation, which is discussed in Chapters 16, 18, and 28. 

10.10 Some Implementation Considerations 

It is instructive to consider some of the issues related to the generation of a PAM 
signal 

n 

x{t) = A^x t g{t-n s ), tem.. (10.26) 

e=i 
Here we focus on delay, causality, and digital implementation. 

10.10.1 Delay 

To illustrate the delay issue in PAM, suppose that the pulse shape g(-) is strictly 
positive. In this case we note that, irrespective of which epoch t' € R we consider, 
the calculation of X(t') requires knowledge of the entire n-tuple X\, . . . , X n . Since 
the sequence X\ , . . . , X n cannot typically be determined in its entirety unless the 
entire sequence D±, . . . ,Dk is determined first, it follows that, when (?(•) is strictly 
positive, the modulator cannot produce X(t') before observing the entire data 
sequence D\, . . . , Dk- And this is true for any t' £ R! Since in the back of our 
minds we think about D\, . . . ,Dk as the data bits that will be sent during the 
entire life of the system or, at least, from the moment it is turned on until it is 
shut off, it is unrealistic to expect the modulator to observe the entire sequence 
D\, . . . , Dk before producing any input to the channel. 

The engineering solution to this problem is to find some positive integer L such 
that, for all practical purposes, g(t) is zero whenever |i| > LT S , i.e., 

g(t) « 0, \t\ > LT S . (10.27) 

In this case we have that, irrespective of t' € R, only 2L+ 1 terms (approximately) 
determine X(t'). Indeed, if k is an integer such that 



«T S < t' < (k + 1)T S , (10.28) 



then 



K+L 

A(t')«A J2 X e g(t-IT S ), K T s <t'<( K +l)T s , (10.29) 

£-max{l,K-L} 

where the sum is assumed to be zero if n + L < 1. 

Thus, if (10.27) holds, then the approximate calculation of X(t') can be performed 
without knowledge of the entire sequence X\ , . . . , X n and the modulator can start 
producing the waveform X(-) as soon as it knows X\, . . . , X\_. 
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10.10.2 Causality 

The reader may object to the fact that, even if (10.27) holds, the signal X(-) may 
be nonzero at negative times. It might therefore seem as though the transmitter 
needs to transmit a signal before the system has been turned on and that, worse 
still, this signal depends on the data bits that will be fed to the system in the 
future when the system is turned on. But this is not really an issue. It all has 
to do with how we define the epoch t = 0, i.e., to what physical time instant 
does t = correspond. We never said it corresponded to the instant when the 
system was turned on and, in fact, there is no reason to set the time origin at 
that time instant or at the "Big Bang." For example, we can set the time origin 
at LT S seconds-past-system-turn-on, and the problem disappears. Similarly, if the 
transmitted waveform depends on X\ , . . . , X\^ , and if these real numbers can only 
be computed once the data bits D±, . . . , D K have been fed to the encoder, then it 
would make sense to set the time origin to the moment at which the last of these n 
data bits has been fed to the encoder. 

Some problems in Digital Communications that appear like tough causality prob- 
lems end up being easily solved by time delays and the redefinition of the time 
origin. Others can be much harder. It is sometimes difficult for the novice to de- 
termine which causality problem is of the former type and which of the latter. As 
a rule of thumb, you should be extra cautious when the system contains feedback 
loops. 



10.10.3 Digital Implementation 

Even when all the symbols among X\ , . . . , X n that are relevant for the calculation 
of X(t') are known, the actual computation may be tricky, particularly if the 
formula describing the pulse shape is difficult to implement in hardware. In such 
cases one may opt for a digital implementation using look-up tables. The idea is 
to compute only samples of X(-) and to then interpolate using a digital-to-analog 
(D/A) converter and an anti-aliasing filter. The samples must be computed at a 
rate determined by the Sampling Theorem, i.e., at least once every 1/(2W) seconds, 
where W is the bandwidth of the pulse shape. 

The computation of the values of X(-) at its samples can be done by choosing L 
sufficiently large so that (10.27) holds and by then approximating the sum (10.26) 
for t' satisfying (10.28) by the sum (10.29). The samples of this latter sum can be 
computed with a digital computer or — as is more common if the symbols take on a 
finite (and small) number of values — using a pre-programmed look-up table. The 
size of the look-up table thus depends on two parameters: the number of samples 
one needs to compute every T s seconds (determined via the bandwidth of g(-) and 
the Sampling Theorem), and the number of addresses needed (as determined by L 
and by the constellation size). 
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10.11 Exercises 

Exercise 10.1 (Exploiting Orthogonality). Let the energy-limited real signals <j>\ and cf>2 
be orthogonal, and let A 1 - 1 -* and A 1 - 2 ' be positive constants. Let the waveform X be given 
by 

X = (A (1) X (1) + A (2) X (2) )^ + (A (1) X (1) - A (2) X (2) )<£ 2 , 

where X 1 - 1 ' and X' 2 ' are unknown real numbers. How can you recover X^ 1 ' and X 1 - 2 ' 
from X? 

Exercise 10.2 (More Orthogonality). Extend Exercise 10.1 to the case where 4>\, . . . <f> n 
are orthonormal; 

X = (a (M) A (1) X (1) + • • • + a ('' 1 >A w Jf w Vi + • • • 



+ ( a™ A (1) X (1) + • • • + a (Hl A h) I H W; 



and where the real numbers (r L '"' for i, v £ {1, . . . , r\\ satisfy the orthogonality condition 



Exercise 10.3 (A Constellation and its Second Moment). What is the constellation cor- 
responding to the (1,3) binary-to-reals block encoder that maps to (+1, +2, +2) and 
maps 1 to (—1, —2, —2)? What is its second moment? Let the real symbols (JQ, ^ £ Z) 
be generated from IID random bits (Dj, j £ Z) in block mode using this block encoder. 
Compute 



lim -J— Y E\XJ 
L^oo 2L + 1 ^ L 



Exercise 10.4 (Orthonormal Signal Representation). Prove Theorem 10.5.1. 
Hint: Recall the Gram- Schmidt procedure. 

Exercise 10.5 (Unbounded PAM Signal). Consider the formal expression 

oo 

X(t)= y X e sinc(— -A, (Gl. 

(i) Show that even if the Xe's can only take on the values ±1, the value of X(T s /2) 
can be arbitrarily high. That is, find a sequence {xe}^? ao such that xe £ {+1, —1} 
for every £ £ Z and 

,2 



e=-L 
(ii) Suppose now that g : R — » R satisfies 



lim > sine ( £) = oo . 



i g(t ^ i + i ( /V ' te 
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for some a, j3 > 0. Show that if for some 7 > we have \xt\ < 7 for all <e2, then 

the sum 

00 

J2 x e g(t-£T s ) 
fc-00 
converges at every i and is a bounded function of t. 

Exercise 10.6 (Etymology). Let g be an integrable real signal. Express the frequency 
response of the matched filter for g in terms of the FT of g. Repeat when g is a complex 
signal. Can you guess the origin of the term "Matched Filter"? 

Hint: Recall the notion of a "matched impedance. " 

Exercise 10.7 (Recovering the Symbols from a Filtered PAM Signal). Let X(-) be the 

PAM signal (10.17), where A > 0, and where g(t) is zero for \t\ > T s /2 and positive for 
|t| < T./2. 

(i) Suppose that X(-) is fed to a filter of impulse response h: t 1— » l{\t\ < T s /2}. Is 
it true that for every t € {1, . . . , n} one can recover Xi from the filter's output at 
time £T S ? If so, how? 

(ii) Suppose now that the filter's impulse response is h: t 1— » I{ — T a /2 < t < 3T s /4}. 
Can one always receover Xi from the filter's output at time ^T s ? Can one recover 
the sequence (Xi, . . . ,X n ) from the n samples of the filter's output at the times 
T.,...,nT.? 

Exercise 10.8 (Continuous Phase Modulation). In Continuous Phase Modulation (CPM) 
the symbols (Xej are mapped to the waveform 

00 
X(t) = Acos(2nf c t + 2nh ^ X e q(t - PJ B )\ t £ K, 

l=-oa 

where f c , h > are constants and q is a mapping from R to R. Is CPM a special case of 
linear modulation? 



Chapter 11 

Nyquist's Criterion 

11.1 Introduction 

In Section 10.7 we discussed the benefit of choosing the pulse shape 4> m Pulse 
Amplitude Modulation so that its time shifts by integer multiples of the baud 
period T s be orthonormal. We saw that if the real transmitted signal is given by 



X(£) = A^X, </>(£- fT s ), teR, 
1=1 

where for all integers £,£' € { 1 , . . . , n} 

<f>(t-ej s )cf>(t-e'T s )dt = i{£ = e'}, 

J —oo 

then 

Xi = t I x(t)4>(t-n s )dt, e = i,...,n, 

A J-oo 

and all the inner products 

x{t)(j){t-n s )dt, e = i,...,n 

can be computed using one circuit by feeding the signal X(-) to a matched filter of 
impulse response cjj and sampling the output at the times t = £J S , for £ = 1, . . . , n. 
(In the complex case the matched filter is of impulse response <fi* .) 

In this chapter we shall address the design of and the limitations on signals that are 
orthogonal to their time-shifts. While our focus so far has been on real functions <fi, 
for reasons that will become apparent in Chapter 16 when we discuss Quadrature 
Amplitude Modulation, we prefer to generalize the discussion and allow <fi to be 
complex. The main results of this chapter are Corollary 11.3.4 and Corollary 11.3.5. 

An obvious way of choosing a signal 4> that is orthogonal to its time shifts by 
nonzero integer multiples of T s is by choosing a pulse that is zero outside some 
interval of length T s , say [— T s /2, T s /2). This guarantees that the pulse and its 
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time shifts by nonzero integer multiples of T s do not overlap in time and that they 
are thus orthogonal. But this choice limits us to pulses of infinite bandwidth, 
because no nonzero bandlimited signal can vanish outside a finite (time) interval 
(Theorem 6.8.2). 

Fortunately, as we shall see, there exist signals that are orthogonal to their time 
shifts and that are also bandlimited. This does not contradict Theorem 6.8.2 
because these signals are not time-limited. They are orthogonal to their time 
shifts in spite of overlapping with them in time. 

Since we have in mind using the pulse to send a very large number of symbols n 
(where n corresponds to the number of symbols sent during the lifetime of the 
system) we shall strengthen the orthonormality requirement to 

cf>(t-n B )<t>*(t-£'T a )dt = !{£ = £'}, for all integers £,£' (11.1) 

and not only to those £,£' in {l,...,n}. We shall refer to Condition (11.1) as 
saying that "the time shifts of cp by integer multiples of T s are orthonormal." 

Condition (11.1) can also be phrased as a condition on <p's self-similarity function, 
which we introduce next. 

11.2 The Self-Similarity Function of Energy-Limited Signals 

We next introduce the self-similarity function of energy-limited signals. This 
term is not standard; more common in the literature is the term "autocorrelation 
function." I prefer "self-similarity function," which was proposed to me by Jim 
Massey, because it reduces the risk of confusion with the autocovariance function 
and the autocorrelation function of stochastic processes. There is nothing random 
in our current setup. 

Definition 11.2.1 (Self-Similarity Function). The self- similarity function R w 

of an energy -limited signal v g L2 is defined as the mapping 

/oo 
v{t+T)v*{t)dt, tgR. (11.2) 

-00 

If v is real, then the self-similarity function has a nice pictorial interpretation: one 
plots the original signal and the result of shifting the signal by r on the same graph, 
and one then takes the pointwise product and integrates over time. 

The main properties of the self-similarity function are summarized in the following 
proposition. 

Proposition 11.2.2 (Properties of the Self-Similarity Function). Let R vv be the 

self-similarity function of some energy-limited signal v e L2 ■ 

(i) Value at zero: 

/•OO 

Rw(0)= / |v(t)| 2 di. (11.3) 
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(ii) Maximum at zero: 

|Rw(t)| < Rvv(O), T6R. (11.4) 

(Hi) Conjugate symmetry: 

Rw(-r) = R; v (t), rel. (11.5) 

(iv) Integral representation: 

Rw(r)= I" \v(f)\ 2 e' 2 ^'-df, TGI, (11.6) 



where v is the L g -Fourier Transform o/v. 
('■yj Uniform Continuity: R vv is uniformly continuous, 
(vi) Convolution Representation: 

R vv (t) = (v*v*)(t), t€R. (11.7) 

Proof. Part (i) follows by substituting t = in (11.2). 

Part (ii) follows by noting that R V v( T ) is the inner product between the mapping 
t t— > v(t + r) and the mapping £ i— > v(t); by the Cauchy-Schwarz Inequality; and by 
noting that both of the above mappings have the same energy, namely, the energy 
of v: 

|Rw(r) 



< 



/>CO 






/ v(t + T)v*(t)dt 






J —oo 






/ poo \ 1/2 / 


poo \ 1/2 


/ \v(t + T)\ 2 dt) 


/ \v*{t)\ 2 At) 


\J-oo / V 


J-oo J 


Ml* 




R vv (0), Tel. 







Part (iii) follows from the substitution s = t + r in the following: 
Rw(t) = / w(t + r)w*(i)di 

i 

v(s) v*($ — r) ds 



-DC 
DC 



v(s — r) v*(s) ds 
= R; v (-t), rel. 

Part (iv) follows from the representation of Rw(t) as the inner product between 
the mapping t t— > v(t + r) and the mapping t <— * v(t); by Parseval's Theorem; 
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and by noting that the Lg-Fourier Transform of the mapping t t— > v(t + r) is the 
(equivalence class of the) mapping / \— > e l2 ^^ T v(f): 



id/. 



/DC 
v(t + r)v*(t)< 
-CX) 

= (ti-> u(t + r),ti-> «(£)) 
= </^e i2 "^t)(/),/^t>(/)) 

e i2w/T |t)(/)| 2 d/, rGl 



— oo 

DC 



Part (v) follows from the integral representation of Part (iv) and from the inte- 
grability of the function / i— > |w(/)| 2 . See, for example, the proof of (Katznelson, 
1976, Section VI, Theorem 1.2). 

Part (vi) follows from the substitution s = t + t and by rearranging terms: 

/oo 
v{t + T)v*(t)dt 
-oo 

v(s) v* (s — t) ds 

v(s) v* (t — s) ds 
= (v*v*)(r). D 

With the above definition we can restate the orthonormality condition (11.1) in 
terms of the self-similarity function R^ of 4>: 

Proposition 11.2.3 (Shift-Orthonormality and Self-Similarity). If <p is energy- 
limited, then the shift-orthonormality condition 

(f>(t-n s )<t>*(t-£%)dt = !{£ = £'}, £,£'eZ (11.8) 

is equivalent to the condition 

R^(£T s ) = l{£ = 0}, £eZ. (11.9) 

Proof. The proposition follows by substituting s = t — £'T S in the LHS of (11.8) 
to obtain 



(j>{t-£T s )4>*{t-£'l s )dt= / (f)(s + (e' -e)T 8 )<f)*(s)ds 

) J — DO 

= Rw((£'-£)T s ). □ 

At this point, Proposition 11.2.3 does not seem particularly helpful because Con- 
dition (11.9) is not easy to verify. But, as we shall see in the next section, this 
condition can be phrased very elegantly in the frequency domain. 
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11.3 Nyquist's Criterion 

Definition 11.3.1 (Nyquist Pulse). We say that a complex signal v : K w C is a 
Nyquist Pulse of parameter T s if 

v (£T s ) = I{£ = 0}, £eZ. (11.10) 

Theorem 11.3.2 (Nyquist's Criterion). Let T s > be given, and let the signal v(-) 
be given by 

/>oo 

v(t)= / <?(/) e i2 ^< d/, t£R, (11.11) 



for some integrable function g: / i— > g(/). T/ien v(-) is a Nyquist Pulse of param- 
eter T s if and only if, 



/•V(2T B ) 
lim / 

J-»oo 7_ 1 /( 2Ts ; 



T.-E s (/ + t 

i=-J 



d/ = 0. (11.12) 



Note 11.3.3. Condition (11.12) is sometimes written imprecisely 1 in the form 

X>(/+t) =Ts ' -k* s *k< (1L13) 

j = -oa 

or, in view of the periodicity of the LHS of (11.13), as 



Y, <?(/ + f)=T s , feR. (11.14) 



T. 

J = -oo 



Neither form is mathematically precise. 

Proof. We will show that u(— £T S ) is the £-th Fourier Series Coefficient of the 
function 2 



Vi; .^ *V J V 2T S - J " 2T S 

It will then follow that the condition that v is a Nyquist Pulse of parameter T s is 
equivalent to the condition that the function in (11.15) has Fourier Series Coeffi- 
cients that are all zero except for the zeroth coefficient, which is one. The theorem 
will then follow by noting that a function is indistinguishable from a constant if, 
and only if, all but its zeroth Fourier Series Coefficient are zero. (This can be 
proved by applying Theorem A. 2. 3 with gi chosen as the constant function.) The 



1 There is no guarantee that the sum converges at every frequency /. 

2 Since, by hypothesis, g is integrable, it follows that the sum in (11.15) converges in the Ci 
sense, i.e., that there exists some integrable function Soo such that 



/•1/(2T S ) J • 

/ »»(/)- E 9{f+^) d/ = 0. 

J-l/Ws) .--_T 's 



i=-J 
By writing 5Z?L-oo 9(f + t~) we are referring to this function s 
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value of the constant can be computed from the zeroth Fourier Series Coefficient. 
To conclude the proof we thus need to relate v(— £T S ) to the £-th Fourier Series 
Coefficient of the function in (11.15). The calculation is straightforward: for every 
integer £, 



v(-n s )= g(f)e^^df 

J — CO 

oo , J_ + ^_ 

= e j 2 7 9{f) e ~ i2T/£Ts df 

j — ~00 T s 2 T s 
j — — OO 2T S 

^ df 



E 


"; 


<?(/ + 


j ^ - 

— e 


-i27r/rr E 


J = -OG 


2T S 








1 


OO 








f: 


E 


g(f + 


f)«" 


-i27r/£T s 


J 2T S 


j=-oo 








n 


(w 


oo 

E 


?(/ + 


t))' 


2T S 




j = -oo 







d/ 



which is the £-th Fourier Series Coefficient of the function in (11.15). Here the first 
equality follows by substituting — £J S for t in (11.11); the second by partitioning the 
region of integration into intervals of length i- ; the third by the change of variable 

f = f — 4", the fourth by the periodicity of the complex exponentials; the fifth by 
Fubini's Theorem, which allows us to swap the order summation and integration; 
and the final equality by multiplying and dividing by vT s . □ 

An example of a function / i— » g(f) satisfying (11.12) is plotted in Figure 11.1. 

Corollary 11.3.4 (Characterization of Shift-Orthonormal Pulses). Let <fi: M. i— > C 

be energy-limited and let T s be positive. Then the condition 



cf>{t - £J S ) <p* {t- l'T s ) dt = !{£ = £'}, £,£'eZ (11.17) 

is equivalent to the condition 



E \K f+ i)\ ^ Ts ' (1L18) 

J=-oo 



i.e., to the condition that the set of frequencies f G M. for which the LHS of (11.18) 
is not equal to T s is of Lebesgue measure zero. 3 



3 It is a simple technical matter to verify that the question as to whether or not (11.18) is 
satisfied outside a set of frequencies of Lebesgue measure zero does not depend on which element 
in the equivalence class of the Lg-Fourier Transform of <p is considered. 
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Figure 11.1: A function g(-) satisfying (11.12) 
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Proof. By Proposition 11.2.3, Condition (11.17) can be equivalently expressed in 
terms of the self-similarity function as 

R H> (ml s )=l{m = 0}, meZ. (11.19) 

The result now follows from the integral representation of the self-similarity func- 
tion R^ (Proposition 11.2.2 (iv)) and from Theorem 11.3.2 (with the additional 
simplification that for every j G Z the function / i— » w(f+ i- ) is nonnegative, so 
the sum on the LHS of (11.18) converges (possibly to +oo) for every / G K). □ 

An extremely important consequence of Corollary 11.3.4 is the following corollary 
about the minimum bandwidth of a pulse <f> satisfying the orthonormality condition 
(11.1). 

Corollary 11.3.5 (Minimum Bandwidth of Shift-Orthonormal Pulses). LetT s > 
be fixed, and let <j) be an energy -limited signal that is bandlimited to W Hz. // the 
time shifts of cj) by integer multiples of T s are orthonormal, then 

w -w s - (1L20) 

Equality is achieved if 



0(/)| = V^i{|/I<2t}. /6» (ii-2i) 



and, in particular, by the sinc(-) pulse 



<j)(t) = ^ sine (—), teM. (11.22) 



or any time-shift thereof. 



Proof. Figure 11.2 illustrates why <f> cannot satisfy (11.18) if (11.20) is violated. 
The figure should also convince you of the conditions for equality in (11.20). 

For the algebraically-inclined readers we prove the corollary by showing that if 
W < 1/(2T S ), then (11.18) can only be satisfied if (j) satisfies (11.21) (outside a set 
of frequencies of Lebesgue measure zero). 4 To see this, consider the sum 



DC 

E 



I s 



(11.23) 



for frequencies / in the open interval (— tVj+tt)- The key observation in the 
proof is that for frequencies in this open interval, if W < 1/(2T S ), then all the terms 
in the sum (11.23) are zero, except for the j = term. That is, 



E K/ + f) 2 = |^(/)| 2 , (w<-L /e (--L+J_)). (1L24) 



j=-oo 



4 In the remainder of the proof we assume that <f>(f) is zero for frequencies / satisfying |/| > W. 
The proof can be easily adjusted to account for the fact that, for frequencies |/| > W, it is possible 
that (/>(■) be nonzero on a set of Lebesgue measure zero. 
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To convince yourself of (11.24), consider, for example, the term corresponding to 
j = 1, namely, \4>(f + 1/T S )| 2 . By the definition of bandwidth, it is zero whenever 
|/ + 1/T S | > W, i.e., whenever / > -1/T S + W or / < -1/T S - W. Since the 
former category / > — 1/T S + W includes — by our assumption that W < 1/(2T S ) — 
all frequencies / > — 1/(2T S ), we conclude that the term corresponding to j = 1 
is zero for all the frequencies / in the open interval (— 2T» + ot)- ^ore g enera hy, 
the j'-th term \4>{f + j/~T s )\ 2 is zero for all frequencies / satisfying the condition 
\f+j/T a \ > W, a condition that is satisfied — assuming j ^ and W < 1/(2T S ) — by 
the frequencies in the open interval that is of interest to us (-^, + tt)- 

For W < 1/(2T S ) we thus obtain from (11.24) that the condition (11.18) implies 
(11.21), and, in particular, that W= 1/(2T S ). □ 

Functions satisfying (11.21) are seldom used in digital communication because they 
typically decay like 1/i so that even if the transmitted symbols Xi are bounded, 
the signal X(t) may take on very high values (albeit quite rarely). Consequently, 
the pulses 4> that are used in practice have a larger bandwidth than 1/(2T S ). 

This leads to the following definition. 

Definition 11.3.6 (Excess Bandwidth). The excess bandwidth in percent of a 
signal (p relative to T s > is defined as 

„ ( bandwidth of 4> \ , 

10 ° % ( 1/(2T,) -'J" ,1L25 » 

The following corollary to Corollary 11.3.4 is useful for the understanding of real 
signals of excess bandwidth smaller than 100%. 

Corollary 11.3.7 (Band-Edge Symmetry). Let T s be positive, and let cf> be a real 
energy-limited signal that is bandlimited to W Hz, where W < 1/T S so <fi is of excess 
bandwidth smaller than 100%. Then the time shifts of (j) by integer multiples ofl s 
are orthonormal if and only if f i— > |0(/)| 2 satisfies the band-edge symmetry 
condition 5 

(•^->)f + K^ +/ )f sT " 0</ ^- ( 1L26 > 

Proof. We first note that, since we have assumed that W < 1/T S , only the terms 
corresponding to j = — 1, j = 0, and j = 1 contribute to the sum on the LHS of 
(11.18) for / G (— 2Tj + 2t)- Moreover, since <f> is by hypothesis real, it follows 

that \<p(— f)\ = |<A(/)|; so the sum on the LHS of (11.18) is a symmetric function 
of /. Thus, the sum is equal to T s on the interval (-of i + jr) if, and only if, it is 
equal to T s on the interval [0> + ot)- F° r frequencies in this shorter interval only 
two terms in the sum contribute: those corresponding to j = and j = — 1. We 



6 Condition (11.26) should be understood to indicate that the LHS and RHS of (11.26) are 
equal for all frequencies < / < 1/(2T S ) outside a set of Lebesgue measure zero. Again, we 
ignore this issue in the proof and assume that <j>{f) is zero for all |/| > W. 
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Figure 11.2: If W < 1/(2T S ), then all the terms of the form \<f>(f + j- ) \ 
over the shaded frequencies W < |/| < 1/(2 T s ). Thus, for W < 1/(2T S ' 



are zero 
the sum 



cannot be equal to T s at any of the shaded frequencies. 
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Figure 11.3: An example of a choice for |0(-)| 2 satisfying the band-edge symmetry 
condition (11.26). 



thus conclude that, for real signals of excess bandwidth smaller than 100%, the 
condition (11.18) is equivalent to the condition 

1 



|0(/)| 2 +|0(/-1/T S )| 2 = T S 



0</< 



2T S 



Substituting /' = ^ — /in this condition leads to the condition 



Kk-r 



H-f 



1 N -■ 

2i; 



T», 0</'< 



2T S 



which, in view of the symmetry of !</>(•) I j is equivalent to 



1 

27; 



/' 



/' + 



2T S 



2 1 

= T B) 0</'< 



2T S 



i.e., to (11.26). 



□ 



Note 11.3.8. The band-edge symmetry condition (11.26) has a nice geometric 
interpretation. This is best seen by rewriting the condition in the form 



ik~>') 



2 T 
■s 

~ ~2~ 



1 
2T 



+ /' 



T a 



0</'< 



1 
2T 



(11.27) 



=§(-/') 



=s(/') 



which demonstrates that the band-edge condition is equivalent to the condition 
that the plot of / t— » |0(/)| 2 in the interval < / < 1/T S be invariant with 
respect to a 180°-rotation around the point (^V, -f)- In other words, the function 

g: /' t— » ^(oT - + /') — 2 should be anti-symmetric for < /' < =7-. I.e., it 
should satisfy 

9(-f) = -§(/'). 0<f'<±r. 
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Figure 11.4: A plot of / i-> |<£(/)| 2 as given in (11.30) with (3 = 0.5. 



Figure 11.3 is a plot over the interval [0, 1/T S ) of a mapping / i— > |</>(/)| 2 that 
satisfies the band-edge symmetry condition (11.26). 

A popular choice of 4> is based on the raised-cosine family of functions. For every 
< (3 < 1 and every T s > 0, the raised-cosine function is given by the mapping 



f»< 



(l/l 



1-/3 - 
2T S - 



if o<|/l<irr, 
^ £?<l/l<\fr 

if \f\>V£- 



(11.28) 



Choosing so that its Fourier Transform is the square root of the raised-cosine 
mapping (11.28) 



ks) 



2 



(l/l 



1-/3 
2T S 



if < |/| < 1_ ; 



2T, 



)) if ^<l/l<^rr, (ii-29) 



if l/l>^ 



2T, ' 



results in (f> being real with 



l<M/)| 5 



cos 







(l/l 



1-/3 
2T, . 



if 0< |/| < 4=£ 



if 
if 



^ < |/| <^ 



2T S 



2T B 



I/I>1 



hi3 



2T, 



(11.30) 



as depicted in Figure 11.4 for (3 = 0.5. 

Using (11.29) and using the band-edge symmetry criterion (Corollary 11.3.7), it 
can be readily verified that the time shifts of <f> by integer multiples of T s are 
orthonormal. Moreover, by (11.29), is bandlimited to (1 + /3)/(2T s ) Hz. It is 
thus of excess bandwidth f3 x 100%. For every < f3 < 1 we have thus found a 
pulse <j) of excess bandwidth f3 x 100% whose time shifts by integer multiples of T s 
are orthonormal. 
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Figure 11.5: The pulse </>(•) of (11.31) with (3 = 0.5 and its self-similarity func- 
tion R^{-) of (11.32). 



In the time domain 



(l-\ i a\ Mi sin (( 1 -/ 3 )*'Tr) 
2/3 ™s((l + l3)n T J + , 

#*) = — ^= ; — TT7TTT-, , te 



TT^Ts 



l-(4/?£) 



L\2 



(11.31) 



with corresponding self-similarity function 



, r \ cos(tt/3t/J s ) 

RH>{t) = SinC — ) : , ~n n ^ > T<E 



V 1 - 4/? 2 r 2 /Tf 



(11.32) 



The pulse of (11.31) is plotted in Figure 11.5 (top) for f3 = 0.5. Its self-similarity 
function (11.32) is plotted in the same figure (bottom). That the time shifts of <fi 
by integer multiples of T s are orthonormal can be verified again by observing that 
R</></> as given in (11.32) satisfies R^,(£T S ) = I{£ = 0} for all £ € Z. 

Notice also that if </)(•) is chosen as in (11.31), then for all < (5 < 1, the pulse <j>(-) 
decays like 1/t 2 . This decay property combined with the fact that the infinite sum 
X^^Li v~ 2 converges (Rudin, 1976, Chapter 3, Theorem 3.28) will prove useful in 
Section 14.3 when we discuss the power in PAM. 
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11.4 The Self-Similarity Function of Integrable Signals 

This section is a bit technical and can be omitted at first reading. In it we define 
the self-similarity function for integrable signals that are not necessarily energy- 
limited, and we then compute the Fourier Transform of the so-defined self-similarity 
function. 

Recall that a Lebesgue measurable complex signal v: K — > C is integrable if 
/_ |w(i)|di < oo and that the class of integrable signal is denoted by Cj. For 
such signals there may be t's for which the integral in (11.2) is undefined. For 
example, if v is not energy-limited, then the integral in (11.2) will be infinite at 
t = 0. Nevertheless, we can discuss the self-similarity function of such signals by 
adopting the convolution representation of Proposition 11.2.2 as the definition. We 
thus define the self- similarity function R vv of an integrable signal v € Ci as 

Rw = v*v', ve£j, (11.33) 

but we need some clarification. Since v is integrable, and since this implies that 
its reflected image v is also integrable, it follows that the convolution in (11.33) is 
a convolution between two integrable signals. As such, we are guaranteed by the 
discussion leading to (5.9) that the integral 



v(<t)v*(t- a) da = / v(t + r) v*(t) dt 

) J — oo 

is defined for all t's outside a set of Lebesgue measure zero. (This set of Lebesgue 
measure zero will include the point r = if v is not of finite energy.) For t's inside 
this set of measure zero we define the self-similarity function to be zero. The value 
zero is quite arbitrary because, irrespective of the value we choose for such t's, we 
are guaranteed by (5.9) that the so-defined self-similarity function Rw is integrable 

/oo 
|Rw(r)|dT< ||v||5, ved, (11.34) 

-oo 

and that its £j -Fourier Transform is given by the product of the C 1 -Fourier Trans- 
form of v and the Ci -Fourier Transform of v*, i.e., 



(11.35) 
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Exercise 11.1 (Passband Signaling). Let /o,T s > be fixed. 

(i) Show that a signal x is a Nyquist Pulse of parameter T s if, and only if, the signal 
t (-> e l27r/o ' x(t) is such a pulse. 

(ii) Show that if x is a Nyquist Pulse of parameter T s , then so is t \— » cos(2-7r/o£) x(t). 

(iii) If t i— > cos(27r/ot) x(t) is a Nyquist Pulse of parameter T s , must x also be one? 



11.5 Exercises 199 



Exercise 11.2 (The Self-Similarity Function of a Delayed Signal). Let u be an energy- 
limited signal, and let the signal v be given by v: t i— » u(t — to). Express the self-similarity 
function of v in terms of the self-similarity of u and to . 

Exercise 11.3 (The Self-Similarity Function of a Frequency Shifted Signal). Let u be 

an energy-limited complex signal, and let the signal v be given by v : t i— » u(t) e' lr *° t for 
some /o £ R. Express the self-similarity function of v in terms of /o and the self-similarity 
function of u. 

Exercise 11.4 (A Self-Similarity Function). Compute and plot the self-similarity function 
of the signal t ^ A(l - |i|/T)l{|£| < T}. 

Exercise 11.5 (Symmetry of the FT of the Self-Similarity Function of a Real Signal). 

Show that if <p is an integrable real signal, then the FT of its self-similarity function is 
symmetric: 

(iW/) = lW-7), /«), </> £ £j is real. 

Exercise 11.6 (The Self-Similarity Function is Positive Definite). Showthat if v is an 
energy-limited signal, n is a positive integer, oti, ■ ■ ■ , a n £ C, and ty , . . . , t n € R, then 

n n 

2_j 22 a 3 a *t Rvv(tj - tl) > 0. 
j=l 1 = 1 

Hint: Compute the energy in the signal t i— » y^"_ t a. 3 v(t + tj). 

Exercise 11.7 (Relaxing the Orthonormality Condition). What is the minimal bandwidth 
of an energy-limited signal whose time shifts by even multiples of T s are orthonormal? 
What is the minimal bandwidth of an energy-limited signal whose time shifts by odd 
multiples of T s are orthonormal? 

Exercise 11.8 (A Specific Signal). Let p be the complex energy-limited bandlimited signal 
whose FT p is given by 

p(/) = T s (l-|T s /-l|)l{o</<^}, /el. 

(i) Plotp(-). 

(ii) Is p(-) a Nyquist Pulse of parameter T s ? 

(hi) Is the real part of p(-) a Nyquist Pulse of parameter T s ? 

(iv) What about the imaginary part of p(-)? 

Exercise 11.9 (Nyquist's Third Criterion). We say that an energy-limited signal ip(-) 
satisfies Nyquist's Third Criterion if 

(2w-l)T B /2 [0 if^ez\{0}. 

(i) Express the LHS of (11.36) as an inner product between i/> and some function g„. 
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(ii) Show that (11.36) is equivalent to 

t. £>/)«-»"<- .i»c«T. /w = {; '<i;° zxm 

(iii) Show that, loosely speaking, tp satisfies Nyquist's Third Criterion if, and only if, 



j =—oo 

is indistinguishable from the all-one function. More precisely, if and only if, 



lim 

J — >oo 



] 



1- E ^(/-f)sinc(T 8 /-j; 
i=-J 



d/ = 0. 



(iv) What is the FT of the pulse of least bandwidth that satisfies Nyquist's Third 
Criterion with respect to the baud T s ? What is its bandwidth? 

Exercise 11.10 (Multiplication by a Carrier). 

(i) Let u be an energy-limited complex signal that is bandlimited to W Hz, and let 
/o > W be given. Let v be the signal v: t *— > u(t) cos(2n fot) . Express the self- 
similarity function of v in terms of /o and the self-similarity function of u. 

(ii) Let the signal <fi be given by <fi: t i— » \Z2cos(2ir f c t) ip(t), where f c > W/2 > 0; 
where 4/ c T s is an odd integer; and where i/> is a real energy-limited signal that 
is bandlimited to W/2 Hz and whose time shifts by integer multiples of (2T S ) 
are orthonormal. Show that the time shifts of <fi by integer multiples of T s are 
orthonormal. 

Exercise 11.11 (The Self-Similarity of a Convolution). Let p and q be integrable signals 
of self-similarity functions R pp and R qq . Show that the self-similarity function of their 
convolution p • q is indistinguishable from R pp • R qq . 



Chapter 12 

Stochastic Processes: Definition 

12.1 Introduction and Continuous-Time Heuristics 

In this chapter we shall define stochastic processes. Our definition will be general so 
as to include the continuous-time stochastic processes of the type we encountered 
in Section 10.2 and also discrete-time processes. 

In Section 10.2 we saw that since the data bits that we wish to communicate 
are random, the transmitted waveform is a stochastic process. But stochastic 
processes play an important role in Digital Communications not only in modeling 
the transmitted signals: they are also used to model the noise in the system and 
other sources of impairments. 

The stochastic processes we encountered in Section 10.2 are continuous-time pro- 
cesses. We proposed that you think about such a process as a real- valued function 
of two variables: "time" and "luck." By "luck" we mean the realization of all the 
random components of the system, e.g., the bits to be sent, the realization of the 
noise processes (that we shall discuss later) , or any other sources of randomness in 
the system. 

Somewhat more precisely, recall that a probability space is defined as a triplet 
(Cl, J 7 , P), where the set i7 is the set of experiment outcomes, the set T is the set 
of events, and where P(-) assigns probabilities to the various events. A measurable 
real- valued function of the outcome is a random variable, and a function of time and 
the experiment outcome is a random process or a stochastic process. A continuous- 
time stochastic process X is thus a mapping 

X: 11x1^1 

(u,t) i-> X(w,t). 

If we fix some experiment outcome u) € O, then the random process can be regarded 
as a function of one argument: time. This function is sometimes called a sample- 
path, trajectory, sample-path realization, or a sample function 

X(u>,-): E^E 

t<-> X(u>,t). 
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Figure 12.1: The pulse shape g: t \— > (l — 4|i|/T s ) l{|t| < T s /4}, and the sample 

function t i-> X^=-4 ^ 5 (* ~~ ^s) when (x_ 4 , a;_ 3 , x_ 2 , x_i, a; , Xi, a; 2 , x 3 , x 4 ) = 
(-1,-1, +1, +1,-1, +1,-1, -1,-1). 



Similarly, if we fix an epoch t € K and view the stochastic process as a function of 
"luck" only, we obtain a random variable: 

X(-,i): fi->R 

lu i— > X(w, £). 

This random variable is sometimes called the value of the process at time t or 
the time-t sample of the process. 

Figure 12.1 shows the pulse shape g : i i — » (l — 4|£|/T S ) I{|i| < T s /4} and a sample- 
path of the PAM signal 



x{t)= J2 x eg (t-n s ) 



(12.1) 



with {Xg} taking value in the set { — 1,+1}. Notice that in this example the 
functions 1 1— > g(t — £J S ) and t <— > g(t — £'T S ) do not "overlap" if £ ^ £' . 

Figure 12.2 shows the pulse shape 



g: tM 



A\t\ 1*1 <t 



t > 



3T a 



te 



(12.2) 



and a sample-path of the PAM signal (12.1) for {Xg} taking value in the set 
{ — 1,+1}. In this example the mappings t t— > g(t — £J S ) and t i— » g(£ — £'T S ) do 
overlap (when f G {£ - 1, £, £ + 1}). 
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Figure 12.2: The pulse shape g of (12.2) and the trajectory t h-» X^=- 
for (a;_4,a;_3,a;_2,a;_i,a;o,a;i,X2,a;3,a;4) = (-1,-1, +1, +1,-1, +1, - 
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12.2 A Formal Definition 

We next give a formal definition of a stochastic process, which is also called a 
random process, or a random function. 

Definition 12.2.1 (Stochastic Process). A stochastic process (X(t), t e T) is an 
indexed family of random variables that are defined on a common probability space 
(fi,.F, P). Here T denotes the indexing set and X(t) (or sometimes X t ) denotes 
the random variable indexed by t. 

Thus, X(t) is the random variable to which t G T is mapped. For each t 6 T 
we have that X(t) is a random variable, i.e., a measurable mapping from the 
experiment outcomes set £1 to the reals. 1 

A stochastic process (X(t), t € T) is said to be centered or of zero mean if all 
the random variables in the family are of zero mean, i.e., if for every tgTwe have 
E[X(t)] = 0. It is said to be of finite variance if all the random variables in the 
family are of finite variance, i.e., if E [X 2 (i)] < oo for all t € T. 

The case where the indexing set T comprises only one element is not particularly 
exciting because in this case the stochastic process is just a random variable with 
fancy packaging. Similarly, when T is finite, the SP is just a random vector or a 
tuple of random variables in disguise. The cases that will be of most interest are 
enumerated below. 



(i) When the indexing set T is the set of integers Z, the stochastic process is 
said to be a discrete-time stochastic process and in this case it is simply 



1 Some authors, e.g., (Doob, 1990), allow for X(i) to take on the values ±oo provided that 
at each t£T this occurs with zero probability, but we, following (Loeve, 1963), insist that X(t) 
only take on finite values. 
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a bi-infinite sequence of random variables 

• • • ,X_2,X_i, Xo,Xl, X2, ■ ■ ■ 

For discrete-time stochastic processes it is customary to denote the random 
variable to which v G Z is mapped by X v rather than X{y) and to refer to 
X v as the time-z^ sample of the process (X v , v S Z). 

(ii) When the indexing set is the set of positive integers N, the stochastic process 
is said to be a one-sided discrete-time stochastic process and it is simply 
a one-sided sequence of random variables 

X\ , X<i , . . . 

Again, we refer to X v as the time-^ sample of \X Vl v € N). 

(iii) When the indexing set T is the real line R, the stochastic process is said to 
be a continuous-time stochastic process and the random variable X(t) 
is the time-i sample of \X(t), t € M). 

In dealing with continuous-time stochastic processes we shall usually denote the 
process by (X(t), t g R), by X, by X(-), or by (X(t)). The random variable to 
which t is mapped, i.e., the time-i sample of the process will be denoted by X(t). 
Its realization will be denoted by x(t), and the sample-path of the process by x or 

Discrete-time processes will typically be denoted by [X, y , v € Z) or by (X v ). 

We shall need only a few results on discrete-time stochastic processes, and those will 
be presented in Chapter 13. Continuous-time stochastic processes will be discussed 
in Chapter 25. 

12.3 Describing Stochastic Processes 

The description of a continuous-time stochastic process in terms of a random vari- 
able (as in Section 10.2), in terms of a finite number of random variables (as in 
PAM signaling), or in terms of an infinite sequence of random variables (as in the 
transmission using PAM signaling of an infinite binary data stream) is particularly 
well suited for describing human-generated stochastic processes or stochastic pro- 
cesses that are generated using a mechanism that we fully understand. We simply 
describe how the stochastic process is synthesized from the random variables. The 
method is less useful when the stochastic process denotes a random signal (such 
as thermal noise or some other interference of unknown origin) that we observe 
rather than generate. In this case we can use measurements and statistical meth- 
ods to analyze the process. Often, the best we can hope for is to be informed 
of the finite-dimensional distributions of the process, a concept that will be 
introduced in Section 25.2. 
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12.4 Additional Reading 

Classic references on stochastic processes to which we shall frequently refer are 
(Doob, 1990) and (Loeve, 1963). We also recommend (Gikhman and Skorokhod, 
1996), (Cramer and Leadbetter, 2004), and (Grimmett and Stirzaker, 2001). For 
discrete-time stochastic processes, see (Pourahmadi, 2001) and (Porat, 2008). 

12.5 Exercises 

Exercise 12.1 (Objects in a Basement). Let Ti, T2, ... be a sequence of positive random 
variables, and let Ni, N2, ■ ■ ■ be a sequence of random variables taking value in N. Define 

00 
X(t) = ^2N j l{t>T j }, (Gl. 
3=1 

Draw some sample paths of (X(t), t£l). Assume that at time zero a basement is empty 
and that Nj denotes the number of objects in the j'-th box, which is brought down to the 
basement at time Tj. Explain why you can think of X(t) as the number of objects in the 
basement at time t. 

Exercise 12.2 (A Queue). Let Si,&, • • ■ be a sequence of positive random variables. A 
system is turned on at time zero. The first customer arrives at the system at time Si 
and the next at time Si + &• More generally, Customer rj arrives S v minutes after 
Customer (77 — 1). The system serves one customer at a time. It takes the system one 
minute to serve each customer, and a customer leaves the system once it has been served. 
Let X(t) denote the number of customers in the system at time t. Express X(t) in terms 
of Si, S2, ■ ■ ■ Is (X(t), t £ R) a stochastic process? If so, draw a few of its sample paths. 
Compute Pr[X(0.5) > 0] . Express your answer in terms of the distribution of Si, S2, ■ ■ ■ 

Exercise 12.3 (A Continuous-Time Markov SP). A particle is in State Zero at time t — 0. 
It stays in that state for T[ seconds and then jumps to State One. It stays in State One 
for T{ seconds and then jumps back to State Zero, where it stays for T 2 seconds. In 
general, T£ is the duration of the particle's stay in State Zero on its z/-th visit to that 
state. Similarly, Ti, is the duration of its stay in State One on its i^-th visit. Assume 
that T[ 0) , t[ 1] , T 2 (0) , T 2 (1) , T 3 (0) , T 3 (1) ,... are independent with t!) 0) being a mean-^ 
exponential and with Ti, being a mean-/ii exponential for all 1/6N. 

Let X(t) be deterministically equal to zero for t < 0, and equal to the particle's state for 
t> 0. 

(i) Plot some sample paths of (X(t), i£l). 

(ii) What is the probability that the sample path t 1— » X(u),t) is continuous in the 
interval [0, £)? 

(iii) Conditional on X(t) = 0, where t > 0, what is the distribution of the remaining 
duration of the particle's stay in State Zero? 

Hint: An exponential RV X has the memoryless property, i.e., that for every s,t > we 
have PrLY > s + 1 1 X > t] = Pr[A > s] . 



206 Stochastic Processes: Definition 

Exercise 12.4 (Peak Power). Let the random variables (Dj, j £ Z) be IID, each taking 
on the values and 1 equiprobably. Let 

X(t) = A Y, {l~'2De)g(t-£J s ), f£l, 

fc-oo 

where A, T s > and g: t \— » l{\t\ < 3T s /4}. Find the distribution of the random variable 

sup|X(i)|. 

Exercise 12.5 (Sample-Path Continuity). Let the random variables (Dj, j £ Z) be IID, 
each taking on the values and 1 equiprobably. Let 

oo 

X{t) = A J2 (1 - 2-D<) 5(t - ^T s ), t€R, 

te-oo 

where A,T S > 0. Suppose that the function g: R — > R is continuous and is zero outside 
some interval, so g(t) — whenever |i| > T. Show that for every lo £ fi, the sample-path 
t i— » X(u>,t) is a continuous function of time. 

Exercise 12.6 (Random Sampling Time). Consider the setup of Exercise 12.5, with the 
pulse shape g: t i-» (l - 2|i|/T s ) l{|t| < T s /2}. Further assume that the RV T is in- 
dependent of [Dj, j £ ZJ and uniformly distributed over the interval [— <5, 5]. Find the 
distribution of X(kT s + T) for any integer k. 

Exercise 12.7 (A Strange SP). Let T be a mean-one exponential RV, and define the SP 
(X(t), t el) by 

'l if i = T, 



otherwise. 

Compute the distribution of X(t\) and the joint distribution of X(ti) and Xfa) for 
ii, i2 G R- What is the probability that the sample-path t v- » X{u), t) is continuous at ii? 
What is the probability that the sample-path is a continuous function (everwhere)? 

Exercise 12.8 (The Sum of Stochastic Processes: Formalities). Let the stochastic pro- 
cesses (Xl(£), t e R) and (X.2{t), t £ R) be defined on the same probability space 
(Q.,J-,P). Let (V(t), t € R) be the SP corresponding to their sum. Express Y as a 
mapping from fi x R to R. What is Y(u>, i) for (w, t) e fJ x R? 

Exercise 12.9 (Independent Stochastic Processes). Let the SP (Xi(i), i e R) be de- 
fined on the probability space (f2i, T\, Pi), and let (X2(t), t £ R) be defined on the 
space (O2, J~2, Pi)- Define a new probability space (fl, T , P) with two stochastic processes 
(X-iit), (£l) and (X 2 (t), fel) such that for every r\ £ N and epochs ti,...,t v £ R 
the following three conditions hold: 

1) The joint law of X\{t\), . . . , Xi(t v ) is the same as the joint law of Xi(ti), . . . ,Xi(t v ). 

2) The joint law of X2(ti), . . . , X2(t r ,) is the same as the joint law of X2(t\), . . . ,X2(t v ). 

3) The r;-tuple X\{t\), . . . ,Xi(t, t ) is independent of the ?7-tuple X 2 (ti), . . . ,X 2 (tri). 

Hint: Consider fi = fii xft- 
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Exercise 12.10 (Pathwise Integration). Let (Xj, j £ Z) be IID random variables denned 
over the probability space (fi, J-, P), with Xj taking on the values and 1 equiprobably. 
Define the stochastic process (X(t), t£l) as 



x(t) = J2 XiHo<t<j + i}, te 



j = — oo 

For a given n £ N, compute the distribution of the random variable 

to \-> / X(w,t)dt. 



Chapter 13 

Stationary Discrete-Time Stochastic 
Processes 



13.1 Introduction 

This chapter discusses some of the properties of real discrete-time stochastic pro- 
cesses. Extensions to complex discrete-time stochastic processes are discussed in 
Chapter 17. 



13.2 Stationary Processes 

A discrete-time stochastic process is said to be stationary if all equal-length tuples 
of consecutive samples have the same joint law. Thus: 

Definition 13.2.1 (Stationary Discrete-Time Processes). A discrete-time SP (X„) 
is said to be stationary or strict sense stationary or strongly stationary 

if for every n € N and all integers n, jf the joint distribution of the n-twple 
(X„, . . . X„ +n _\) is identical to that of the n-twple {X„i , . . . , X„i +n _\): 

\X V , . . . X, q+n _i) = [Xjji, . . . X v r +n _i). (13-1) 



.s? 



Here = denotes equality of distribution (law) so X = Y indicates that the random 
variables X and Y have the same distribution; (X, Y) = (W, Z) indicates that the 
pair {X, Y) and the pair (W, Z) have the same joint distribution; and similarly for 
n-tuples. 

By considering the case where n = 1 we obtain that if (X v ) is stationary, then the 
distribution of X^ is the same as the distribution of X 7 y , for all n, n' € Z. That 
is, if (Xv) is stationary, then all the random variables in the family (X v , v € Z) 
have the same distribution: the random variable X\ has the same distribution as 
the random variable X2, etc. Thus, 



(X v , v e Z) stationary] =4- (X u = X u v £ Z j . (13.2) 
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By considering in the above definition the case where n = 2 we obtain that for a 
stationary process (X v ) the joint distribution of X\,X2 is the same as the joint 
distribution of X n ,X n+ i for any integer r\. More, however, is true. If \X-v) is 
stationary, then the joint distribution of X v , X v * is the same as the joint distribution 
of X v+ „, X v+V ,r. 

((X v , veZ) stationary) => ( (X v , X v >) = (X v+V , X n+V >), v,i/',i] £ ZJ . (13.3) 

To prove (13.3) first note that it suffices to treat the case where v > v' because 
(X,Y) = (W,Z) if, and only if, (Y,X) = (Z,W). Next note that stationarity 
implies that 

\X v i , . . . , X v ) = {X v+l/ i , . . . , X v+ v) (13-4) 

because both are (v — v' + l)-length tuples of consecutive samples of the process. 
Finally, (13.4) implies that the joint distribution of (X v > ,X U ) is identical to the 
joint distribution of (X n+I ^> , X v+l/ ) and (13.3) follows. 

The above argument can be generalized to more samples. This yields the following 
proposition, which gives an alternative definition of stationarity, a definition that 
more easily generalizes to continuous-time stochastic processes. 

Proposition 13.2.2. A discrete-time SP \X Vl i/€Z) is stationary if, and only if, 
for every n£N, all integers V\, . . . , v n eZ, and every n s Z 

{X Vl , ■ ■ ■ ,X Vn ) = {X, q + Vx , . . . ,X V+Un ). (13.5) 

Proof. One direction is trivial and simply follows by substituting consecutive in- 
tegers for V\, . . . , v n in (13.5). The proof of the other direction is a straightforward 
extension of the argument we used to prove (13.3). □ 

By noting that (Wi, . . . , W n ) = (Z 1 , . . . , Z n ) if, and only if, 1 £\ ajWj = J2j a j z j 
for all ai, . . . , a n € R we obtain the following equivalent characterization of sta- 
tionary processes: 

Proposition 13.2.3. A discrete-time SP {X v j is stationary if, and only if, for every 
nGN, all r], vi, . . . , v n € Z, and all ai, ... ,a n £ R 



ctjX Vj = y^ajX v . +11 . (13.6) 



E 

j=l 3 = 1 



13.3 Wide-Sense Stationary Stochastic Processes 

Definition 13.3.1 (Wide-Sense Stationary Discrete-Time SP). We say that a 
discrete-time SP [X u , v £ Z) is wide-sense stationary (WSS) or weakly 



1 This follows because the multivariate characteristic function determines the joint distribution 
(see Proposition 23.4.4 or (Dudley, 2003, Chapter 9, Section 5, Theorem 9.5.1)) and because 
the characteristic functions of all the linear combinations of the components of a random vector 
determine the multivariate characteristic function of the random vector (Feller, 1971, Chapter XV, 
Section 7). 
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stationary or covariance stationary or second-order stationary or weak- 
sense stationary if the following three conditions are satisfied: 

1) The random, variables X u , i/GZ are all of finite variance: 

Var[X„] < oo, veZ. (13.7a) 

2) The random, variables X v , i/GZ have identical means: 

E[X„] = E[Xi] , i/eZ. (13.7b) 

3) The quantity E.\X v iX v \ depends on v' and v only via v — v' : 

E\X V ,X V \ = E[X V+ ,,X V+ ,} , v,t/,rie Z. (13.7c) 

Note 13.3.2. By considering (13.7c) when v = v' we obtain that all the samples 
of a WSS SP have identical second moments. And since, by (13.7b), they also all 
have identical means, it follows that all the samples of a WSS SP have identical 
variances: 

(X v , i/£Z) WSS) => (Var[X„] = Var[Xi] , v e zV (13.8) 

An alternative definition of a WSS process in terms of the variance of linear func- 
tionals of the process is given below. 

Proposition 13.3.3. A finite-variance discrete-time SP (X v ) is WSS if and only 
if for every n£N, every rj, v\, . . . , v n <^7L 7 and every a.\, . . . ,a n G R 

n n 

> ctjX llj and > otjX Uj+rj have the same mean & variance. (13.9) 

Proof. The proof is left as an exercise. Alternatively, see the proof of Proposi- 
tion 17.5.5. □ 

13.4 Stationarity and Wide-Sense Stationarity 

Comparing (13.9) with (13.6) we see that, for finite-variance stochastic processes, 
stationarity implies wide-sense stationarity, which is the content of the following 
proposition. This explains why stationary processes are sometimes called strong- 
sense stationary and why wide-sense stationary processes are sometimes called 
weak-sense stationary. 

Proposition 13.4.1 (Finite-Variance Stationary Stochastic Processes Are WSS). 

Every finite-variance discrete-time stationary SP is WSS. 

Proof. While this is obvious from (13.9) and (13.6) we shall nevertheless give an 
alternative proof because the proof of Proposition 13.3.3 was left as an exercise. The 
proof is straightforward and follows directly from (13.2) and (13.3) by noting that if 
X = Y, then ELY] = ELY] and that if {X, Y) = {W, Z), then E[XY] = E[WZ]. D 
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It is not surprising that not every WSS process is stationary. Indeed, the definition 
of WSS processes only involves means and covariances, so it cannot possibly say 
everything regarding the distribution. For example, the process whose samples 
are independent with the odd ones taking on the value ±1 equiprobably and with 
the even ones uniformly distributed over the interval [— v3, +v3] is WSS but not 
stationary. 

13.5 The Autocovariance Function 

Definition 13.5.1 (Autocovariance Function). The autocovariance function 

Kxx '■ Z — > R of a WSS discrete-time SP (X^j is defined by 

Kxx{rj)±Cw[X v+t ,,X v \, ryeZ. (13.10) 

Thus, the autocovariance function at Tj is the covariance between two samples of 
the process taken r\ units of time apart. Note that because (X v ) is WSS, the RHS 
of (13.10) does not depend on v. Also, for WSS processes all samples are of equal 
mean (13.7b), so 

Kxx(t)) = Cov[X u+ri ,X u ] 

= e\x v+11 x v ]-(e\x 1 }) 2 , ,ez. 

In some engineering texts the autocovariance function is called "autocorrelation 
function." We prefer the former because Kxx(v) does not measure the correlation 
coefficient between X v and X v+ „ but rather the covariance. These concepts are 
different also for zero-mean processes. Following (Grimmett and Stirzaker, 2001) 
we define the autocorrelation function of a WSS process of nonzero variance as 

A Cov[X v+ri ,X v \ ,„,,, 

PXX{V) = VarLYi] ' V G Z ' (13 " U) 

i.e., as the correlation coefficient between X v+V and X v . (Recall that for a WSS 
process all samples are of the same variance (13.8), so for such a process the 
denominator in (13.11) is equal to ^/Var[X, y ] \/ar[X v+v }.) 

Not every function from the integers to the reals is the autocovariance function of 
some WSS SP. For example, the autocovariance function must be symmetric in the 
sense that 

Kxx(-V) = Kxx(v), i)6Z, (13.12) 

because, by (13.10), 

Kxx{v) = Cov[X v+r ,,X v ] 

= Cov[X,j,X p _^] 
= Cov[A",j_^,X s ] 
= Kjcx (-»?), v e Z, 
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where in the second equality we defined v = v + n, and where in the third equal- 
ity we used the fact that for real random variables the covariance is symmetric: 
Cov[X,y] = Cov[Y,X]. 

Another property that the autocovariance function must satisfy is 



n n 



because 



2^ 2^ a v a v > Kxxjv - v') > 0, ai,...,a„€ 

v=\ v' = l 



^ ^ a v a v > Kxx(^ - v') = ^ ^ avav CovLX^X^] 



(13.13) 



I/=l 


/'=i 




r n n -| 


= Cov 


2 , a,,X v , 2_^ oc v 'X v i 


L i/=i i/'=i J 


r n i 


= Var 


y ^a u X u 




'■i/=i J 


> 0. 







It turns out that (13.12) and (13.13) fully characterize the autocovariance functions 
of discrete-time WSS stochastic processes in a sense that is made precise in the 
following theorem. 

Theorem 13.5.2 (Characterizing Autocovariance Functions). 

(i) If Kxx is the autocovariance function of some discrete-time WSS SP (Xv), 
then Kxx must satisfy (13.12) & (13.13). 



(ii) If K : Z — ► R is some function satisfying 



ana 



n n 

w 



zL, Z_/ ot v a v >K{y — v') > 0, InsN, ai. 



,a n £ 



(13.14) 



(13.15) 



i/=ii/'=i 



then there exists a discrete-time WSS SP \Xv) whose autocovariance func- 
tion Kxx is given by Kxx{v) = r ^( ? ?) f or a ^ ?7 £ Z. 

Proof. We have already proved Part (i). For a proof of Part (ii) see, for example, 
(Doob, 1990, Chapter X, § 3, Theorem 3.1) or (Pourahmadi, 2001, Theorem 5.1 in 
Section 5.1 and Section 9.7). 2 □ 

A function K: Z — > R satisfying (13.14) & (13.15) is called a positive definite 
function. Such functions have been extensively studied in the literature, and in 
Section 13.7 we shall give an alternative characterization of autocovariance func- 
tions based on these studies. But first we introduce the power spectral density. 



2 For the benefit of readers who have already encountered Gaussian stochastic processes, we 
mention here that if K(-) satisfies (13.14) & (13.15) then we can even find a Gaussian SP whose 
autocovariance function is equal to K(-). 
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13.6 The Power Spectral Density Function 

Roughly speaking, the power spectral density (PSD) of a discrete-time WSS 
SP (X u ) of autocovariance function Kxx is an integrable function on the interval 
[—1/2, 1/2) whose 77-th Fourier Series Coefficient is equal to Kxx{v)- Such a func- 
tion does not always exist. When it does, it is unique in the sense that any two such 
functions can only differ on a subset of the interval [—1/2, 1/2) of Lebesgue measure 
zero. (This follows because integrable functions on the interval [—1/2,1/2) that 
have identical Fourier Series Coefficients can differ only on a subset of [—1/2, 1/2) 
of Lebesgue measure zero; see Theorem A. 2. 3.) Consequently, we shall speak of 
"the" PSD but try to remember that this does not always exist and that, when it 
does, it is only unique in this restricted sense. 

Definition 13.6.1 (Power Spectral Density). We say that the discrete-time WSS 
SP \Xv) is of power spectral density Sxx if Sxx is an integrable mapping 
from the interval [—1/2, 1/2) to the reals such that 

Kxx(v) = I ' Sxx(O) e-^ e dO, r, € Z. (13.16) 

J -1/2 

But see also Note 13.6.5 ahead. 

Note 13.6.2. We shall sometimes abuse notation and, rather than say that the 
stochastic process \X V , v € Z) is of PSD Sxx , we shall say that the autocovariance 
function Kxx is of PSD Sxx- 

By considering the special case of r\ = in (13.16) we obtain that 

VarLY„] = Kxx(O) 

,1/2 

Sxx(0)d<9, i/eZ. (13.17) 

'-1/2 

The main result of the following proposition is that power spectral densities are 
nonnegative (except possibly on a set of Lebesgue measure zero). 

Proposition 13.6.3 (PSDs Are Nonnegative and Symmetric). 

(i) If the WSS SP (X v , v e Z) of autocovariance Kxx is of PSD Sxx, then, 
except on subsets of ( — 1/2,1/2) of Lebesgue measure zero, 

Sxx(O) > (13.18) 

and 

Sxx(O) = S X x(-0). (13.19) 

(ii) If the function 5: [—1/2, 1/2) — > M. is integrable, nonnegative, and symmetric 
(in the sense that S(8) = S(—9) for all 6 6 (—1/2, 1/2)), then there exists a 
WSS SP (X v ) whose PSD Sxx is given by 

5xx(0) = S(d), 6e [-1/2, 1/2). 
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Proof. The nonnegativity of the PSD (13.18) will be established later in the more 
general setting of complex stochastic processes (Proposition 17.5.7 ahead). Here we 
only prove the symmetry (13.19) and establish the second half of the proposition. 

That (13.19) holds (except on a set of Lebesgue measure zero) follows because K xx 
is symmetric. Indeed, for any 77 G Z we have 

1/2 

(s xx (e)-s xx (-e))e- ]2 ^ e de 
1/2 

,1/2 ,1/2 

= / S xx {6)e-'' 2 ^ ae- / S xx (-9)e-'' 2 ^ d9 

J-l/2 J-l/2 

= Kxx(^) - / s xx (&) e -' 2 <-^ s ae 

J-l/2 

= Kxx(v) - K-xx(-ri) 

= 0, tjeZ. (13.20) 

Consequently, all the Fourier Series Coefficients of the function 9 t—> S xx (9) — 
5 xx (—9) are zero, thus establishing that this function is zero except on a set of 
Lebesgue measure zero (Theorem A. 2. 3). 

We next prove that if the function S: [—1/2, 1/2) — > R is symmetric, nonnegative, 
and integrable, then it is the PSD of some WSS real SP. We cheat a bit because 
our proof relies on Theorem 13.5.2, which we never proved. From Theorem 13.5.2 
it follows that it suffices to establish that the sequence K : Z — > R defined by 

/•1/2 
K(rj)= S(0) e-' 2vr > 9 d0, r/eZ (13.21) 

J-l/2 

satisfies (13.14) & (13.15). 

Verifying (13.14) is straightforward: by hypothesis, S(-) is symmetric so 

,1/2 
K(-rj)= / S(0) e -' ,2w( --^ e d$ 

J-l/2 

.1/2 

S(-v?)e- i27rw dip 
-1/2 
.1/2 

S(ip) e-' 2 ^ v dip 

'1/2 

= K(7y), v ez, 

where the first equality follows from (13.21); the second from the change of variable 
p = —9; the third from the symmetry of S(-), which implies that S(— p) = S(ip); 
and the last equality again from (13.21). 

We next verify (13.15). To this end we fix arbitrary ai, ...,a„el and compute 

n n n n /'1/2 

Y^ Yl <x v <x v >K{v - v 1 ) = J2 J2 av<Xv ' / S (*) e-'' 2 ^"-^ 6 d9 

v=\v' = l u=lu' = l J-l/2 
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/1/2 / " n -. 

/1/2 / " " 

!/2 \„=l^=l 



a, e- i2 "- e a,, e iWe ) dfl 



1/2 / n \ / ™ 

S(0) Va,e- i2 ^ )( 5>"' e ) de 

1/2 \„ =1 /V=i 

.1/2 ^ 2 

d<9 



-1/2 



'-1/2 

> 0, (13.22) 

where the first equality follows from (13.21); the subsequent equalities by simple 
algebraic manipulation; and the final inequality from the nonnegativity of S(-). □ 

Corollary 13.6.4. // a discrete-time WSS SP (X„) has a PSD, then it also has a 
PSD S XX for which (13.18) holds for every 9 e [-1/2, 1/2) and for which (13.19) 
holds for every 9 G (—1/2, 1/2) (and not only outside subsets of Lebesgue measure 
zero). 

Proof. Suppose that (X v ) is of PSD S XX - Define the mapping S : [-1/2, 1/2) -» R 
by 3 

s(9)= U(\^x(9)\ + \Sxx(-9)\) if 9 e (-1/2,1/2) 
1 ' \l if 9= -1/2. l ' ' 

By the proposition, Sxx an <i S(-) differ only on a set of Lebesgue measure zero, 
so they must have identical Fourier Series Coefficients. Since the Fourier Series 
Coefficients of Sxx agree with Kxx, it follows that so must those of S(-). Thus, S(-) 
is a PSD for \X V \ and it is by (13.23) nonnegative on [—1/2, 1/2) and symmetric 

on (-1/2,1/2). ' □ 

Note 13.6.5. In view of Corollary 13.6.4 we shall only say that (X v ) is of PSD Sxx 
if the function Sxx — i n addition to being integrable and to satisfying (13.16) — is 
also nonnegative and symmetric. 

As we have noted, not every WSS SP has a PSD. For example, the process defined 
by 

x u = x, i/ez, 

where X is some zero-mean unit-variance random variable has the all-one auto- 
covariance function Kxx(f]) = 1; i) 6 Z, and this all-one sequence cannot be 
the Fourier Series Coefficients sequence of an integrable function because, by the 
Riemann-Lebesgue lemma (Theorem A. 2. 4), the Fourier Series Coefficients of an 
integrable function must converge to zero. 4 



3 Our choice of S(— 1/2) as 1 is arbitrary; any nonnegative value whould do. 

4 One could say that the PSD of this process is Dirac's Delta, but we shall refrain from doing 
so because we do not use Dirac's Delta in this book and because there is not much to be gained 
from this. (There exist processes that do not have a PSD even if one allows for Dirac's Deltas.) 
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In general, it is very difficult to characterize the autocovariance functions having 
a PSD. We know by the Riemann-Lebesgue lemma that such autocovariance func- 
tions must tend to zero, but this necessary condition is not sufficient. A very useful 
sufficient (but not necessary) condition is the following: 

Proposition 13.6.6 (PSD when Kxx Is Absolutely Summable). // the autoco- 
variance function Kxx is absolutely summable, i.e., 

oo 

J2 \ K xx(v)\ <oo, (13.24) 

r] — — oo 

then the function 

oo 

S(9)= Y, Kxx(r/)e i2m)e , 9 & [-1/2,1/2] (13.25) 

'q— — oo 

is continuous, symmetric, nonnegative, and satisfies 

.1/2 

S(0) er a ^ 6 d<9 = K xx {ri), i)£Z. (13.26) 

1/2 

Consequently, S(-) is a PSD for Kxx- 

Proof. First note that because \K X x{v) e~' a ^ dr] \ = \K X x{rfj\, it follows that (13.24) 
guarantees that the sum in (13.25) converges uniformly and absolutely. And since 
each term in the sum is a continuous function, the uniform convergence of the 
sum guarantees that S(-) is continuous (Rudin, 1976, Chapter 7, Theorem 7.12). 
Consequently, 

,1/2 

/ |S(0)|d0<oo, (13.27) 

'-1/2 

and it is meaningful to discuss the Fourier Series Coefficients of S(-). 

We next prove that the Fourier Series Coefficients of S(-) are equal to Kxx, i-e., 
that (13.26) holds. This can be shown by swapping integration and summation 
and using the orthonormality property 

1/2 

e^"-" )e d0 = l{n = r/}, n, rf e Z (13.28) 

-1/2 

as follows: 

-1/2 ,1/2 

; " ' ' e )e-'' 2 ^ e d9 

'-1/2 J -1/2 ' 

,1/2 



/' r I / . . 

S(0) e -'' 277r > e de = V K xx {v r )e' 27rT1 ' 6 

'1/2 -/-VaVvt^oo 

f] K XX (v') I ' e< 2 ^' e e~' 2 ^e de 
=-oo ■'-Va 

£ K ^') / 1/2 e< 2 ^>'-^ de 

,_ J -1/2 



r]'— — oC' 
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7]' — — OO 

= Kxx(v), 1^- 

It remains to show that S(-) is symmetric, i.e., that S(#) = S(— 9), and that it is 
nonnegative. The symmetry of S(-) follows directly from its definition (13.25) and 
from the fact that Kxx, like every autocovariance function, is symmetric (Theo- 
rem 13.5.2 (i)). 

We next prove that S(-) is nonnegative. From (13.26) it follows that S(-) can 
only be negative on a subset of the interval [—1/2, 1/2) of Lebesgue measure zero 
(Proposition 13.6.3 (i)). And since S(-) is continuous, this implies that S(-) is 
nonnegative. □ 

13.7 The Spectral Distribution Function 

We next briefly discuss the case where (X v ) does not necessarily have a power 
spectral density function. We shall see that in this case too we can express the 
autocovariance function as the Fourier Series of "something," but this "something" 
is not an integrable function. (It is, in fact, a measure.) The theorem will also yield 
a characterization of nonnegative definite functions. The proof, which is based on 
Herglotz's Theorem, is omitted. The results of this section will not be used in 
subsequent chapters. 

Recall that a random variable taking value in the interval [—a, a] is said to be 
symmetric (or to have a symmetric distribution) if Pr[X < — £] = Pr[X > £] for 
all £ £ [—a, a]. 

Theorem 13.7.1. A function p: Z — » M. is the autocorrelation function of a real 
WSS SP if and only if, there exists a symmetric random variable taking value 
in the interval [—1/2, 1/2] such that 

p{n) = E [e- i27r " e ] , ,eZ. (13.29) 

The cumulative distribution function of is fully determined by p. 

Proof. See (Doob, 1990, Chapter X, § 3, Theorem 3.2), (Pourahmadi, 2001, The- 
orem 9.22), (Shiryaev, 1996, Chapter VI, § 1.1), or (Porat, 2008, Section 2.8). □ 

This theorem also characterizes autocovariance functions: a function K : Z — » R 
is the autocovariance function of a real WSS SP if, and only if, there exists a 
symmetric random variable taking value in the interval [—1/2,1/2] and some 
constant a > such that 

K(ry) = aE[e- i27r " e ] , n € Z. (13.30) 

(By equating (13.30) at n = we obtain that a = K(0), i.e., the variance of the 
stochastic process.) 



218 Stationary Discrete-Time Stochastic Processes 

Equivalently, we can state the theorem as follows. If [X u j is a real WSS SP, then 
its autocovariance function Kxx can be expressed as 

Kxx(v) = Var[Xx] E [ e -'' 2 ^ e ] , v eZ (13.31) 

for some random variable taking value in the interval [—1/2, 1/2] according to 
some symmetric distribution. If, additionally, Var[Xi] > 0, then the cumulative 
distribution function F<~>(-) of is uniquely determined by Kxx- 

Note 13.7.2. 

(i) If the random variable above has a symmetric density /&(■), then the 
process is of PSD 6 t— » Var[Xi] /q(0). Indeed, by (13.31) we have for every 
integer r\ 

K X x(v) = Var[Xi] E[ e - i27r " e ] 

= Var[Xi] f ' fe(ff)e- , *"> d9 

J -1/2 
.1/2 



/ (\Zzr\X!} fo(6)) e- i2 ^° d9 . 

J -1/2 V ' 



'-V 

(ii) Some authors, e.g., (Grimmett and Stirzaker, 2001) refer to the cumulative 
distribution function i*e(") of 0, i.e., to the mapping 9 <— > Pr[0 < 9], as 
the Spectral Distribution Function of \X v f. This, however, is not stan- 
dard. It is only in agreement with the more common usage in the case where 
Var[Xi] = l. 5 



13.8 Exercises 

Exercise 13.1 (Discrete-Time WSS Stochastic Processes). Prove Proposition 13.3.3. 

Exercise 13.2 (Mapping a Discrete-Time Stationary SP). Let {Xv) be a stationary 
discrete-time SP, and let g: R — » R be some arbitrary (Borel measurable) function. For 
every i/?Z, let Y v — g(X v ). Prove that the discrete-time SP (Y V J is stationary. 

Exercise 13.3 (Mapping a Discrete-Time WSS SP). Let (X„) be a WSS discrete-time 
SP, and let g : R — » R be some arbitrary (Borel measurable) bounded function. For every 
i/£Z, let Y v = g(X v ). Must the SP (Y„) be WSS? 

Exercise 13.4 (A Sliding-Window Mapping of a Stationary SP). Let [X v ) be a stationary 
discrete-time SP, and let g : M 2 — » R be some arbitrary (Borel measurable) function. For 
every i/£Z define Y v = g(X„- 1 ,X l/ ). Must (Y^J be stationary? 



5 The more common definition is that 9 i— > Var[Xi] Pr[0 < 8] is the spectral measure or 
spectral distribution function. But this is not a distribution function in the probabilistic sense 
because its value at 8 = oo is Var[Xi] which may be different from one. 
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Exercise 13.5 (A Sliding-Window Mapping of a WSS SP). Let (X„) be a WSS discrete- 
time SP, and let g: R 2 — > R be some arbitrary bounded (Borel measurable) function. For 
every v £ Z define Y v = g{X v -i,X v ). Must (Y„) be WSS? 

Exercise 13.6 (Existence of a SP). For which values of a,/3 G R is the function 



Kxx(m) = 



( 1 if to = 0, 

a if to = 1, 

P ifm = -l, 

otherwise, 



m 6 Z 



the autocovariance function of some WSS SP (X„ , v £ ZJ ? 

Exercise 13.7 (Dilating a Stationary SP). Let (-X'l/) be a stationary discrete-time SP, and 
define Y v — X2 V for every i/£Z. Must \Y v j be stationary? 

Exercise 13.8 (Inserting Zeros Periodically). Let [Xv) be a stationary discrete-time SP, 
and let the RV U be independent of it and take on the values and 1 equiprobably. Define 
for every i/gZ 

y„ = (° if ^ isodd and Z V =Y V+U . (13.32) 

I X v /2 if v is even 

Under what conditions is \Yv) stationary? Under what conditions is (Zj) stationary? 

Exercise 13.9 (The Autocovariance Function of a Dilated WSS SP). Let (X v ) be a WSS 

discrete-time SP of autocovariance function Kxx ■ Define Y v — X?,v for every v e Z. Must 
(Yv) be WSS? If so, express its autocovariance function Kyy in terms of Kxx- 

Exercise 13.10 (Inserting Zeros Periodically: the Autocovariance Function). Let (X„) be 
a WSS discrete-time SP of autocovariance function Kxx , and let the RV U be independent 
of it and take on the values and 1 equiprobably. Define [Z u ) as in (13.32). Must (■£„) 
be WSS? If yes, express its autocovariance function in terms of Kxx- 

Exercise 13.11 (Stationary But Not WSS). Construct a discrete-time stationary SP that 
is not WSS. 

Exercise 13.12 (Complex Coefficients). Show that (13.13) will hold for complex numbers 
di, ■ ■ ■ ,a„ provided that we replace the product a v a v i with ct v ct* v ,. That is, show that if 
Kxx is the autocovariance function of a real discrete-time WSS SP, then 



2_, z2 ai,ct *v' K "(" ~ "') ^ 0' 



Ql. 



Chapter 14 

Energy and Power in PAM 

14.1 Introduction 

Energy is an important resource in Digital Communications. The rate at which 
it is transmitted — the "transmit power" — is critical in battery-operated devices. 
In satellite applications it is a major consideration in determining the size of the 
required solar panels, and in wireless systems it influences the interference that one 
system causes to another. In this chapter we shall discuss the power in PAM signals. 
To define power we shall need some modeling trickery which will allow us to pretend 
that the system has been operating since "time — oo" and that it will continue 
to operate indefinitely. Our definitions and derivations will be mathematically 
somewhat informal. A more formal account for readers with background in Measure 
Theory is provided in Section 14.6. 

Before discussing power we begin with a discussion of the expected energy in trans- 
mitting a finite number of bits. 



14.2 Energy in PAM 

We begin with a seemingly completely artificial problem. Suppose that K inde- 
pendent data bits D\, . . . ,-Dk; each taking on the values and 1 equiprobably, 
are mapped by a mapping enc : {0, 1} K — > K N to an N-tuple of real numbers 
(X\, . . . , Jn), where X? is the £-th component of the N-tuple enc(l?i, . . . , D\A. 
Suppose further that the symbols X\, . . . , X-^ are then mapped to the waveform 

N 

X{t) = Aj2 x e9{t-lT s ), iel, (14.1) 

where g € C2 is an energy-limited real pulse shape, A > is a scaling factor, and 
T s > is the baud period. We seek the expected energy in the waveform X(-). 

We assume that X(-) corresponds to the voltage across a unit- load or to the current 
through a unit-load, so the transmitted energy is the time integral of the mapping 
t 1— > X 2 {t). Because the data bits are random variables, the signal X(-) is a 
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stochastic process. Its energy J_ X 2 {t) dt is thus a random variable. 1 If (CI, J 7 , P) 
is the probability space under consideration, then this RV is the mapping from i7 
to R defined by 

/*oo 

X 2 {oj,t)dt. 



This RV's expectation — the expected energy — is denoted by E and is given by 



X 2 (t)dt 



(14.2) 



Note that even though we are considering the transmission of a finite number of 
symbols (N), the waveform X(-) may extend in time from — oo to +oo. 

We next derive an explicit expression for E. Starting from (14.2) and using (14.1), 
X 2 {t)dt 

DO 

/oo / N 
-oo \ e=1 

f (Y,x e g(t-n s ))(Y,x e ,g(t-l / T s ))dt 

,00 N N 

/ E E x ^' 9 ^ - ej *) 9 ^ - /T «) dt 

,oo N N 

A I EE E [***/'] 9(t - iTs) 9(t - 1%) dt 
J- 00 t=iv =i 

N N oo 

A ' E E E [*/**'] / 9(t - £T S ) 9(t - £%) dt 
i=i e=\ J ~°° 

N N 

A 2 ^^E[X £ X,,]R gg ((£-£')T s ), (14.3) 



A 2 E 



A 2 E 



where R gg is the self-similarity function of the pulse g(-) (Section 11.2). Here the 
first equality follows from (14.2); the second from (14.1); the third by writing the 
square of a number as its product with itself (£ 2 = ££); the fourth by writing the 
product of sums as the double sum of products; the fifth by swapping expectation 
with integration and by the linearity of expectation; the sixth by swapping integra- 
tion and summation; and the final equality by the definition of the self-similarity 
function (Definition 11.2.1). 

Using Proposition 11.2.2 (iv) we can also express R gg as 



R gg (T) 



g(f) e'^ T df, re 



(14.4) 



1 There are some slight measure-theoretic mathematical technicalities that we are sweeping 
under the rug. Those are resolved in Section 14.6. 
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and hence rewrite (14.3) as 



E = A2 / EEEl^'le 1 



']e' 2 ^- e '^\g(f)\ 2 df. 



We define the energy per bit as 

renergy 

Eb bwr 

and the energy per real symbol as 

energy 



E 
K 



real symbol 



E 



(14.5) 



(14.6) 



(14.7) 



As we shall see in Section 14.5.2, if infinite data are transmitted using the binary- 
to-reals (K, N) block encoder enc(-), then the resulting transmitted power P is given 

by 



(14.8) 



This result will be proved in Section 14.5.2 after we carefully define the average 
power. The units work out because if we think of T s as having units of seconds per 
real symbol then: 




E s 


energy 


real symbol 


T 


second 


real symbol 



Es r energy ! 
T, L second J 



(14.9) 



Expression (14.3) for the expected energy E is greatly simplified in two cases that 
we discuss next. The first is when the pulse shape g satisfies the orthogonality 
condition 

/oo 
g(t)9(t-KT s )dt=\\g\\ 2 2 l{ K = 0}, Ke{0,l,...,N-l}. (14.10) 

-oo 

In this case (14.3) simplifies to 

N 

E = A 2 ||g|| 2 ]T E [Xj] , ({* -» g(t - fTe)}^ 1 orthogonal) . (14.11) 

1=1 

(In this case one need not even go through the calculation leading to (14.3); the 
result simply follows from (14.1) and the Pythagorean Theorem (Theorem 4.5.2).) 

The second case for which the computation of E is simplified is when the distribu- 
tion of D\, . . . , Dk and the mapping enc(-) result in the real symbols X\, . . . , An 
being of zero mean and uncorrelated: 2 



E[X,]=0, £e{l,...,N} 



(14.12a) 



2 Actually, it suffices that (14.12b) hold; (14.12a) is not needed. 
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and 

E[XiX e ,} = E[X$] !{£ = £'}, £,£'€{!,..., N}. (14.12b) 

In this case too (14.3) simplifies to 

N 

E = A 2 \\g\\l Y^ E [XJ] , ((X e , £eZ) zero-mean & uncorrelatecT) . (14.13) 



|2 

= 1 



14.3 Defining the Power in PAM 

If (X(i), f G R) is a continuous-time stochastic process describing the voltage 
across a unit-load or the current through a unit-load, then it is reasonable to 
define the power P in (X(t), t G Kj as the limit 



lim -— E 

T-^oo 2T 



T 

X 2 (t)dt 

-T 



(14.14) 



But there is a problem. Over its lifetime, a communication system is only used 
to transmit a finite number of bits, and it only sends a finite amount of energy. 
Consequently, if (X(t), (£ l) corresponds to the transmitted waveform over the 
system's lifetime, then P as defined in (14.14) will always end up being zero. The 
definition in (14.14) is thus useless when discussing the transmission of a finite 
number of bits. 

To define power in a useful way we need some modeling trickery. Instead of thinking 
about the encoder as producing a finite number of symbols, we should now pretend 
that the encoder produces an infinite sequence of symbols (Xt, £ € Z), which are 
then mapped to the infinite sum 

oo 

x{t) = a Y, x e g(t-n a ), tel. (14.15) 

t=-oo 

For the waveform in (14.15), the definition of P in (14.14) makes perfect sense. 
Philosophically speaking, the modeling trickery we employ corresponds to mea- 
suring power on a time scale much greater than the signaling period T s but much 
shorter than the system's lifetime. 

But philosophy aside, there are still two problems we must address: how to model 
the generation of the infinite sequence \Xi, £ £ ZJ, and how to guarantee that 
the sum in (14.15) converges for every igi. We begin with the latter. If g is of 
finite duration, then at every epoch teK only a finite number of terms in (14.15) 
are nonzero and convergence is thus guaranteed. But we do not want to restrict 
ourselves to finite-duration pulse shapes because those, by Theorem 6.8.2, cannot 
be bandlimited. Instead, to guarantee convergence, we shall assume throughout 
that the following conditions both hold: 

1) The symbols [Xi, £ € ZJ are uniformly bounded in the sense that there 
exists some constant 7 such that 

\XA < 7, £eZ. (14.16) 
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D-k + 1, ••• ,D n , D\, ■■■ ,Dk, _D k + i, ••• ,D 2 k 

Ienc(-) enc(-) enc(-) 

1 1 • • • 

,X_n+i, ••• ,Xo, Xi, ■■■ ,X^i,Xu+i, ■■■ ,X 2 -n, 



enc(D-x+i, ■ ■ ■ , D ) enc(-Di, . . . , D K ) enc(L» K +i, • • • , Ajk) 
Figure 14.1: Bi-Infinite Block Encoding. 



2) The pulse shape t t— » g(i) decays faster than \/t in the sense that there exist 
positive constants a,f3 > such that 

\g(t)\< , , iw T | 1+a , teR. (14.17) 



l*/T B 



Using the fact that the sum ^2 n>1 n ( 1+a > converges whenever a > (Rudin, 
1976, Theorem 3.28), it is not difficult to show that if both (14.16) and (14.17) 
hold, then the infinite sum (14.15) converges at every epoch tel. 

As to the generation of [Xf,, £ € Z), we shall consider three scenarios. In the 
first, which we analyze in Section 14.5.1, we ignore this issue and simply assume 
that (Xp, £ G Z) is a WSS discrete-time SP of a given autocovariance function. 
In the second scenario, which we analyze in Section 14.5.2, we tweak the block- 
encoding mode that we introduced in Section 10.4 to account for a bi-infinite data 
sequence. We call this tweaked mode bi-infinite block encoding and describe 
it more precisely in Section 14.5.2. It is illustrated in Figure 14.1. Finally, the 
third scenario, which we analyze in Section 14.5.3, is similar to the first except 
that we relax some of the statistical assumptions on (Xi, I € Z). But we only 
treat the case where the time shifts of the pulse shape by integer multiples of T s 
are orthonormal. 

Except in the third scenario, we shall only analyze the power in the stochastic 
process (14.15) assuming that the symbols {Xp,, i G Z) are of zero mean 

E[JQ]=0, £eZ. (14.18) 

This not only simplifies the analysis but also makes engineering sense, because it 
guarantees that (X(t), t € M) is centered 

E[X(t)]=0, teR, (14.19) 

and, for the reasons that we outline in Section 14.4, transmitting zero-mean wave- 
forms is usually power efficient. 
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Figure 14.2: The above two systems have identical performance. In the former 
the transmitted power is the power in t i — > X(t) whereas in the second it is the 
power in t *-> X(t) — c(t). 



14.4 On the Mean of Transmitted Waveforms 

We next explain why the transmitted waveforms in digital communications are 
usually designed to be of zero mean. 3 We focus on the case where the transmitted 
signal suffers only from an additive disturbance. The key observation is that given 
any transmitter that transmits the SP (X(t), t £ l) and any receiver, we can 
design a new transmitter that transmits the waveform t i— » X(t) — c(t) and a 
new receiver with identical performance. Here c(-) is any deterministic signal. 
Indeed, the new receiver can simply add c(-) to the received signal and then pass 
on the result to the old receiver. That the old and the new systems have identical 
performance follows by noting that if (N(t), t € R) is the added disturbance, then 
the received signal on which the old receiver operates is given by t <— > X(t) + N(t). 
And the received signal in the new system is t i— > X(t) — c(t) + N(t), so after we 
add c(-) to this signal we obtain the signal X(t) + N(t), which is equal the signal 
that the old receiver operated on. Thus, the performance of a system transmitting 
X(-) can be mimicked on a system transmitting X(-) — c(-) by simply adding c(-) 
at the receiver. See Figure 14.2. 

The addition at the receiver of c(-) entails no change in the transmitted power. 
Therefore, if a system transmits X(-), then we might be able to improve its power 
efficiency without hurting its performance by cleverly choosing c(-) so that the 
power in X(-) — c(-) be smaller than the power in X(-) and by then transmitting 
t i— > X(t) — c(t) instead of t <— > X(t). The only additional change we would need 
to make is to add c(-) at the receiver. 

How should we choose c(-)? To answer this we shall need the following lemma. 



3 This, however, is not the case with some wireless systems that transmit training sequences 
to help the receiver learn the channel and acquire timing information. 
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Lemma 14.4.1. If W is a random variable of finite variance, then 

E[{W-c) 2 } > \/ar[W], ceR (14.20) 

with equality if, and only if, 

c=B[W\. (14.21) 

Proof. 

E [(W - cf] = E [{(W - E[W}) + (E[W] - c)) 2 ] 

= E[(W- E[W}) 2 ] + 2E[W -E[W]](E[W}-c) + {E[W}-c) 2 












E[(W- 


■ E[^]) 2 ] + 


(E[W] 


~cf 


E[(W- 


■ E[W1) 2 ] 






Var[W] . 


) 







with equality if, and only if, c = E [W] . □ 

With the aid of Lemma 14.4.1 we can now choose c(-) to minimize the power in 
t i— > X(t) — c(t) as follows. Keeping the definition of power (14.14) in mind, we 
study 

^J^E[(X(t)-c(t)f]dt 

and note that this expression is minimized over all choices of the waveform c(-) by 
minimizing the integrand, i.e., by choosing at every epoch t the value of c(t) to be 

the one that mininimizes E (X(t) — c(t)) . By Lemma 14.4.1 this corresponds to 

choosing c(t) to be E[X(t)]. It is thus optimal to choose c(-) as 

c{t) = E[X{t)] 7 (61. (14.22) 

This choice results in the transmitted waveform being t t— > X(t) — E[X(t)], i.e., in 
the transmitted waveform being of zero mean. 

Stated differently, if in a given system the transmitted waveform is not of zero 
mean, then a new system can be built that transmits a waveform of lower (or 
equal) average power and whose performance on any additive noise channel is 
identical. 



14.5 Computing the Power in PAM 

We proceed to compute the power in the signal 

oo 

X(t) = A Y] Xtgit-lTs), (61 (14.23) 



14.5 Computing the Power in PAM 



227 



under various assumptions on the bi- infinite random sequence [Xg, £ G ZJ. We 
assume throughout that Conditions (14.16) & (14.17) are satisfied so the infinite 
sum converges at every epoch tel. The power P is defined as in (14.14). 4 



14.5.1 (Xi) Is Zero-Mean and WSS 

Here we compute the power in the signal (14.23) when [Xc, £ £ Z) is a centered 
WSS SP of autocovariance function K xx : 



B[X e ] = 0, £eZ, 
E\X e X e+m ] = K xx (m), £,meZ. 



(14.24a) 
(14.24b) 



We further assume that the pulse shape satisfies the decay condition (14.17) and 
that the process (Xg, £ G Z) satisfies the boundedness condition (14.16). 

We begin by calculating the expected energy of X(-) in a half-open interval [r, r+T s ) 
of length T s and in showing that this expected energy does not depend on r, i.e., 
that the expected energy in all intervals of length T s are identical. We calculate 
the energy in the interval [r, r + T s ) as follows: 



T+T s 



X 2 (t)dt 



r+T B 



A" / E 
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A 
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At, 
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£— — oo £' — — co 
T _|_y, co co 
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Y, X,g(t-£\ 

— -co 

) CO 

Y, Y, XX e ,g(t-£T s )g(t-£% 

— -co £' — — co 

CO 

Y Y E[X e X e ,]g(t-n s )9(t-l'T s )dt 

— — co kJ ^^ — CO 

CO CO 

Y Y. E{XiX e+m }g(t-£T s )g(t-(£ + m)T s )dt 

£— — oq rn— — co 

7—f-T °° °° 

1 ' Y K xx(m) Y 9(t - £T S ) g(t -(£ + m)T B ) dt 



(14.25) 



m= — co £— — co 

00 /*r+T s -£T s 



A 2 ^ K xx (m) Y 



g(t') 9{t' - mT s ) dt' (14.26) 



t-£T s 



CO /»C© 

A 2 J] K xx (m) g{t')g{t'-mJ s )dt' 

m— — co — °° 

co 

A 2 ^ Kxx(m) R gg (mT s ), rel, 



(14.27) 



m— — oo 



4 A general mathematical definition of the power of a stochastic process is given in Defini- 
tion 14.6.1 ahead. 
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where the first equality follows by the structure of X(-) (14.15); the second by 
writing X 2 (t) as X(t)X(t) and rearranging terms; the third by the linearity of 
the expectation, which allows us to swap the double sum and the expectation 
and to take the deterministic term g(t — £T s )g(t — £'T S ) outside the expectation; 
the fourth by defining m = £' — £; the fifth by (14.24b); the sixth by defining 
il = t — £T S ', the seventh by noting that the integrals of a function over all the 
intervals [r — £T Sl r — £T S + T s ) sum to the integral over the entire real line; and the 
final by the definition of the self-similarity function R gg (Section 11.2). 

Note that, indeed, the RHS of (14.27) does not depend on the epoch r at which 
the length- T s time interval starts. This observation will now help us to compute 
the power in X(-). Since the interval [— T, +T) contains [(2T)/T S J disjoint intervals 
of the form [t,t + T s ), and since it is contained in the union of |~(2T)/T S ] such 
intervals, it follows that 



2T 
T 



X 2 (t)dt 



< E 



X 2 {t)dt 



U-J 



< 



2T 
Ts 



r+T s 



X 2 {t)dt 



(14.28) 



where we use |_£J to denote the greatest integer smaller than or equal to £ (e.g., 
|_4.2J = 4), and where we use [£] to denote the smallest integer that is greater than 
or equal to £ (e.g., [4.2] = 5) so 



Note that from (14.29) and the Sandwich Theorem it follows that 

T>0. 
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(14.29) 



(14.30) 



Dividing (14.28) by 2T and using (14.30) we obtain that 



lim — : E 
T^oo 2T 



X 2 {t)dt 



-T 



T 



T+T 8 



X 2 {t)dt 



which combines with (14.27) to yield 



T s 



-A A 



E 



K X x{m) Rg g (mT s ). 



(14.31) 



The power P can be alternatively expressed in the frequency domain using (14.31) 
and (14.4) as 



i 2 /.oo oo 



P=— / £ K xx (m)e' 2 *f mT °\g(f)\ 2 df. (14.32) 

's J— oo „ 



771= — OO 



An important special case of (14.31) is when the symbols (XA are zero-mean, 
uncorrelated, and of equal variance a x . In this case \^xx{fn) = a\ I{ m = 0}; an( ^ 
the only nonzero term in (14.31) is the term corresponding to m = so 



(14.33) 
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14.5.2 Bi-lnfinite Block-Mode 

The bi-infinite block-mode with a (K,N) binary-to-reals block encoder 

enc: {0,1} K -» R N 

is depicted in Figure 14.1 and can be described as follows. A bi-infinite sequence 
of data bits (-Dj, j G Z) is fed to an encoder. The encoder parses this sequences 
into K-tuples and defines for every integer v£Z the "i/-th data block" Yi v 

D„^(£U+i,--.,£U+k), ^eZ. (14.34) 

Each data block D„ is then mapped by enc(-) to a real N-tuple, which we denote 
by X„: 

X„ = enc(D„), v G Z. (14.35) 

The bi-infinite sequence [Xi, IgZ) produced by the encoder is the concatenation 
of these N-tuples so 

(X„n+i,...,*„n+n) =X„, i/GZ. (14.36) 

Stated differently, for every v G Z and 77 G {1,...,N}, the symbol X^N +r) is the 
ry-th component of the N-tuple X„. The transmitted signal X(-) is as in (14.15) 
with the pulse shape g satisfying the decay condition (14.17) and with T s > being 
arbitrary. (The boundedness condition (14.16) is always guaranteed in bi-infinite 
block encoding.) 

We next compute the power P in X(-) under the assumption that the data bits 
{Dj, j G ZJ are independent and identically distributed (IID) random bits, where 
we adopt the following definition. 

Definition 14.5.1 (IID Random Bits). We say that a collection of random variables 
are IID random bits if the random variables are independent and each of them 
takes on the values and 1 equiprobably. 

The assumption that the bi-infinite data sequence (Dj, j G Z) consists of IID 
random bits is equivalent to the assumption that the K-tuples (D„, v G Z) are 
IID with Yi y being uniformly distributed over the set of binary K-tuples {0,1} K . 
We shall also assume that the real N-tuple enc(D) is of zero mean whenever the 
binary K-tuple is uniformly distributed over {0, 1} K . We will show that, subject to 
these assumptions, 



(14.37) 



This expression has an interesting interpretation. On the LHS is the power in 
the transmitted signal in bi-infinite block encoding using the (K, N) binary-to-reals 
block encoder enc(-). On the RHS is the quantity E/(NT S ), where E, as in (14.3), is 
the expected energy in the signal that results when only the K-tuple (-Di, . . . , -Dk) 
is transmitted from time — 00 to time +00. Using the definition of the energy 



p =sk E 


r>OG / N \ 2 
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per-symbol E s (14.7) we can also rewrite (14.37) as in (14.8). Thus, in bi-infmite 
block-mode, the transmitted power is the energy per real symbol E s normalized by 
the signaling period T s . Also, by (14.5), we can rewrite (14.37) as 

P =^r/ EE E ^^'] ei2T/( ^' )Ts |5(/)| 2 d/. (14.38) 

' N ' s J -°° £=11> = 1 

To derive (14.37) we first express the transmitted waveform X(-) as 

DC 

X(t) = A J^ X e g(t-£T S ) 



£=-oo 
oo N 



a J2 E^n+^-^n + t?)!;) 

iy— — oo r}— 1 

DC 

A E m(X i/ ,*-i/NT s ), tei, (14.39) 



-> K is given by 

N 

u: (a;i,...,XN,t) >-> E^'' 5 ^ - ^ Ts )' (14.40) 

rj=l 

We now make three observations. The first is that because the law of D„ does not 
depend on is, neither does the law of X^ (= enc(D„)): 

X„=X„/, v,i/'eZ. (14.41) 

The second is that the assumption that enc(D) is of zero mean whenever D is 
uniformly distributed over {0, 1} K implies by (14.40) that 



E [m(X„, t)} =0, (u G Z, t G m. (14.42) 

The third is that the hypothesis that the data bits (Dj, j G Z) are IID implies 
that (D„, v G Z) are IID and hence that (X„, z^ G Z) are also IID. Consequently, 
since the independence of X„ and X„/ implies the independence of u(X„,t) and 
u(X I/ /t / ), it follows from (14.42) that 

E[u(X„, £)«(%,,,£')] = 0, (t,/el,i//i/,i/,i/'ez). (14.43) 



Using (14.39) and these three observations we can now compute for any epoch r G 
the expected energy in the time interval [r, r + NT S ) as 



t+NT s 

E[X 2 (i)] d£ 
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r r+NT s oo 
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(14.44) 



where the first equality follows from(14.39); the second by writing the square as 
a product and by using the linearity of expectation; the third from (14.43); the 
fourth because the law of X„ does not depend on v (14.41); the fifth by changing 
the integration variable to t' = t — NT S ; the sixth because the sum of the integrals 
is equal to the integral over K; and the seventh by (14.40). 

Note that, indeed, the RHS of (14.44) does not depend on the starting epoch r of 
the interval. Because there are [2T/(NT S )J disjoint length-NT s half-open intervals 
contained in the interval [— T, T) and because |~2T/(NT S )] such intervals suffice to 
cover the interval [— T, T), it follows that 



2T 



/OO / N 

(A]Tx,3(i-n; 
-oo \ *_, 



< E 
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X 2 (t)dt 
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2T 
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/oo / N \ 2 

(AY,Xi9(t-n s )) dt 
-oo \ ^j_j J 



Dividing by 2T and then letting T tend to infinity establishes (14.37). 



14.5.3 Time Shifts of Pulse Shape Are Orthonormal 

We next consider the power in PAM when the time shifts of the real pulse shape by 
integer multiples of T s are orthonormal. To remind the reader of this assumption, 
we change notation and denote the pulse shape by </>(•) and express the orthonor- 
mality condition as 



4>{t - ei s ) <p{t - £%) dt = i{£ = £'}, e, £' g z. 



(14.45) 
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The calculation of the power is a bit tricky because (14.45) only guarantees that the 
time shifts of the pulse shape are orthogonal over the interval (—00,00); they need 
not be orthogonal over the interval [— T, +T] (even for very large T). Nevertheless, 
intuition suggests that if £T S and £'l s are both much smaller than T, then the 
orthogonality of t 1— * (j)(t — £J S ) and t t— > cf>(t — £'T S ) over the interval (—00,00) 
should imply that they are nearly orthogonal over [— T, T]. Making this intuition 
rigorous is a bit tricky and the calculation of the energy in the interval [— T, T] 
requires a fair number of approximations that must be justified. 

To control these approximations we shall assume a decay condition on the pulse 
shape that is identical to (14.17). Thus, we shall assume that there exist positive 
constants a and (3 such that 



I </>(*) I < 







l*A 



|l+a : 



te 



(14.46) 



(The pulse shapes used in practice, like those we encountered in (11.31), typically 
decay like l/|t| 2 so this is not a serious restriction.) We shall also continue to assume 
the boundedness condition (14.16) but otherwise make no statistical assumptions 
on the symbols (Xg, t € Z). 

The main result of this section is the next theorem. 

Theorem 14.5.2. Let the continuous-time SP (X(t), t € M.) be given by 



x(t) = A J2 x e( j>(t-n s ), te 



(14.47) 



where A > 0; T s > 0; the pulse shape </>(•) is a Borel measurable function satisfying 
the orthogonality condition (14.45) and the decay condition (14.46); and where the 
random sequence (X(, I G Z) satisfies the boundedness condition (14.16). Then 



lim -— E 
T^oo 2T 




A z 



i im _L_ y E [X|1 



(14.48) 



whenever the limit on the RHS exists. 



Proof. The proof is somewhat technical and may be skipped. We begin by arguing 
that it suffices to prove the theorem for the case where T s = 1. To see this, assume 
that T s > is not necessarily equal to 1. Define the function 

(14.49) 



4>{t) = VT B 0(T B t), t€R, 
and note that, by changing the integration variable to r = £T S , 

/oo 
4>{t - e\ s ) 4>{t - e%) dr 
-00 

= !{£ = £'}, IJ'eZ, (14.50a) 
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where the second equality follows from the theorem's assumption about the or- 
thogonality of the time shifts of <fi by integer multiples of T s . Also, by (14.49) and 
(14.46) we obtain 



Wt)| = VT 8 |0(T B t)| 





< \A 



l + |t| 1 +« 
0' 



l + l*l 1+c 



t £ R, (14.50b) 



for some /?' > and a > 0. 

As to the power, by changing the integration variable to a = t/7 s we obtain 



s£(g*«^>)'*- ^^Os**'- )"' 1 '- (14 - 50c) 

It now follows from (14.50a) & (14.50b) that if we prove the theorem for the pulse 
shape 4> with T s = 1, it will then follow that the power in ^2 Xi<p(cr — £) is equal 
to liniL^oo (2L + I) -1 5^ E [AT|1 and that consequently, by (14.50c), the power in 
J2 Xi <j)(t — £l s ) is equal FT 1 liniL^oo(2L + l)^ 1 J2 E \_Xf\ ■ In the remainder of the 
proof we shall thus assume that T s = 1 and express the decay condition (14.46) as 

W)\< 1 + L , *£R (14-51) 

for some j3, a > 0. 

To further simplify notation we shall assume that T is a positive integer. Indeed, 
if the limit is proved for positive integers, then the general result follows from the 
Sandwich Theorem by noting that for T > (not necessarily an integer) 



•m 



and by noting that both |TJ /T and |T]/Ftend to 1, as T— > oo. 

We thus proceed to prove (14.48) for the case where T s = 1 and where the limit 
T — > oo is only over positive integers. We also assume A = 1 because both sides of 
(14.48) scale like A . We begin by introducing some notation. For every integer £ 
we denote the mapping t t— > cj>(t — £) by 4>i, and for every positive integer T we 
denote the windowed mapping t t— > <j>(t — £) I{|i| < T} by (fie_ w . Finally, we fix some 
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(large) integer v > and define for every T > v, the random processes 

Xo = 2_^ X£ 4>l,vn 
\l\<J-v 

Xi = 2_j Xe 4>e,w, 

T-v<\£\<T+v 

X 2 = 2_j Xe<j>£ iVf , 

T+v<\e\<oo 

and the unwindowed version of Xo 

X o = E Xe & 
\e\<T-v 



(14.53) 
(14.54) 
(14.55) 

(14.56) 



*(t)I{|*| < T} = X (t) + Xi(t) + X 2 (t) 

= XX+(X (t)-X»(t))+X 1 (t) + X 2 (t), teR. (14.57) 

Using arguments very similar to the ones leading to (4.14) (with integration re- 
placed by integration and expectation) one can show that (14.57) leads to the 
bound 
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(14.58) 



Note that, by the orthonormality assumption on the time shifts of (ft, 
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It follows from (14.58) and (14.59) that to conclude the proof of the theorem it 
suffices to show that for every fixed v > 2 we have for T exceeding v 



and that 
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(14.60) 
(14.61) 

(14.62) 
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We begin with (14.60), which follows directly from the Triangle Inequality, 
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T-v<\l\<T+v 



where the second inequality follows from the boundedness condition (14.16), from 
the fact that (pe,w is a windowed version of the unit-energy signal (p£ so ||</>£,w|L < 



Hh 



1, and because there are Av terms in the sum. 



We next prove (14.62). To that end we upper-bound |X2(i)| for |i| < T as follows: 
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(14.63) 



where the equality in the first line follows from the definition of X2 (14.55) by 
noting that for \t\ < T we have 4>e(t) = </>^ w (t)); the inequality in the second line 
follows from the boundedness condition (14.16) and from the Triangle Inequality for 
Complex Numbers (2.12); the inequality in the third line from the decay condition 
(14.51); the inequality in the fourth line because |£ — £| > |£| — |£| whenever 
£, C € R; the inequality in the fifth line because we are only considering \t\ < T and 
because over the range of this summation \£\ > T+ v\ the equality in the sixth line 
from the symmetry of the summand; the equality in the seventh line by defining 
£ = £ — T; the inequality in the eighth line from the monotonicity of the function 
£ 1— > £ _1 ~ Q , which implies that 



1 



< 



1 



-iC 



d£; 
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and where the final equality on the ninth line follows by computing the integral 
and by noting that for t that does not satisfy |i| < T the LHS |X2(t)| is zero, so 
the inequality is trivial. 

Using (14.63) and noting that X2(t) is zero for \t\ > T, we conclude that 



|X 2 f 2 <2T(^)V- 



(14.64) 



from which (14.62) follows. 

We next turn to proving (14.61). We begin by using the Triangle Inequality and 
the boundedness condition (14.16) to obtain 
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We next proceed to upper-bound the RHS of (14.65) by first defining the function 



p(r) = J <j> 2 {t)dt 

l\t\>T 

and by then using this function to upper-bound \\<f>£ — </>£, w |L as 

||0i-^, w || 2 <p(T-K|), \£\<J, 
because 



(14.66) 



(14.67) 



Hi 



= P '(T-\e\) 

It follows from (14.65) and (14.67) that 



— T f'OQ 

<f> 2 {t-£)dt + / <f> 2 {t-£)dt 

oo Jj 

-T-£ /-oo 

<j) 2 {s)ds + / <j> 2 {s)ds 

oo Jj-£ 

-T+\e\ /-oo 

(j) 2 {s)ds+ / (j) 2 (s)ds 

oo ■'" r -|<| 

</> 2 (s)ds, \£\<J 
\s\>T-\e\ 

2/ 



|Xo-XJj|| 2 2 < 7 2 ( E He* ~ <f>t 

\t\<T-u 



14.6 A More Formal Account 237 

<7 2 ( E p( J - 

^ \e\<r-v 
<j 2 (2 J2 p(J-£) 

^ 0<£<T-i/ 

= 47 2 (E^)) ' ( 14 - 68 ) 

We next note that the decay condition (14.51) implies that 

p(r) -(rT2^) r ~^"' T>0 ' (14 - 69) 

because for every r > 0, 

p\T)=\ d> 2 (t)dt 

J\t\>T 

P 2 



M2 + 2r 

\t\>T \ l \ 



dt 



2p z / r'- za dt 



2f3 2 
~ l + 2a 

It now follows from (14.69) that 



r -l-2a_ 



<(i)"7 T r'-« 



l + 2a/ ^ y _i 



and hence, by evaluating the integral explicitly, that 

T 
1 - 



T lim ^72 E Pfa) = °- ( 14 - 7 °) 

r/— v 

From (14.68) and (14.70) we thus obtain (14.61). □ 

14.6 A More Formal Account 

In this section we present a more formal definition of power and justify some of 
the mathematical steps that we took in deriving the power in PAM signals. This 
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section is quite mathematical and is recommended for readers who have had some 
exposure to Measure Theory. 

Let 1Z denote the cr-algebra generated by the open sets in R. A continuous-time 
stochastic process (X(t)) defined over the probability space (57, T, P) is said to be 
a measurable stochastic process if the mapping (uJ,t) i— > X(co,t) from 51 x R 
to R is measurable when its range R is endowed with the cr-algebra 7Z and when its 
domain 51 x K is endowed with the product cr-algebra T x TZ. Thus, (X(t), t £ Mj 
is measurable if the mapping (uj,t) t— > X(uj,t) is TxIZjlZ measurable. 

From Fubini's Theorem it follows that if (X(t), t € M.) is measurable and if T > 
is deterministic, then: 

(i) For every ui S 51, the mapping t \— > -X" 2 (c<j,£) is Borel measurable; 
(ii) the mapping 

W h-» / A" 2 (w,t)di 

is a random variable (i.e., J 7 measurable) possibly taking on the value +00; 



(iii) and 



/ X 2 {t)dt = E[X 2 {t)]dt, TeR. (14.71) 



Definition 14.6.1 (Power of a Stochastic Process). We say that a measurable 
stochastic process (X(t), tgK) is of power P if the limit 



lim — i E 

T^oo 2T 



T 

X 2 (t)dt 

T 



(14.72) 



exists and is equal to P. 



Proposition 14.6.2. // the pulse shape g is a Borel measurable function satisfying 
the decay condition (14.17) for some positive a,/3,T s , and if the discrete-time SP 
(Xg, leZ) satisfies the boundedness condition (14.16) for some 7 > 0, then the 

stochastic process 

00 

X: (w,*)h-» A Y] X £ (uj) g(t - £T s ) (14.73) 

£=-00 

is a measurable stochastic process. 

Proof. The mapping (uj,t) 1— > Xi{uj) is TxlZjlZ measurable because Xg is a ran- 
dom variable, so the mapping lu 1— > ^(lj) is JF/7?. measurable. The mapping 
(uJ,t) 1— > Ac/(i — £T S ) is TxlZjlZ measurable because g is Borel measurable, so 
£ 1— > g(t — £T S ) is 7Z/7Z measurable. Since the product of measurable functions is 
measurable (Rudin, 1974, Chapter 1, Section 1.9 (c)), it follows that the mapping 



5 See (Billingsley, 1995, Section 37, p. 503) or (Loeve, 1963, Section 35) for the definition of a 
measurable stochastic process and see (Billingsley, 1995, Section 18) or (Loeve, 1963, Section 8.2) 
or (Halmos, 1950, Chapter VII) for the definition of the product cr-algebra. 



14.6 A More Formal Account 239 

{uj, i) i— > AXi(uj) 9(t — £T S ) is FxlZ/TZ measurable. And since the sum of measur- 
able functions is measurable (Rudin, 1974, Chapter 1, Section 1.9 (c)), it follows 
that for every positive integer L G Z, the mapping 



(w, f)^A^] JQ(w) ff(i - £J S 



= -L 



is J-~xTZ/TZ measurable. The proposition now follows by recalling that the pointwise 
limit of every pointwise convergent sequence of measurable functions is measurable 
(Rudin, 1974, Theorem 1.14). □ 

Having established that the PAM signal (14.73) is a measurable stochastic process 
we would next like to justify the calculations leading to (14.31). To justify the 
swapping of integration and summations in (14.26) we shall need the following 
lemma, which also explains why the sum in (14.27) converges. 

Lemma 14.6.3. If g(-) is a Borel measurable function satisfying the decay condition 

\g{t)\< 1 t? ttt , *eM (14-74) 

1 Wl ~ 1+ \t/J s \ 1 + a y ' 

for some positive a, T S7 and (3, then 

Y^ / \g(t)g(t-mJ s )\dt<oo. (14.75) 



m— — oo 



Proof. The decay condition (14.74) guarantees that g is of finite energy. From the 
Cauchy-Schwarz Inequality it thus follows that the terms in (14.75) are all finite. 
Also, by symmetry, the term in (14.75) corresponding to m is the same as the one 
corresponding to — m. Consequently, to establish (14.75), it suffices to prove 

00 /*oo 

X] / \g(t)9(t-mT s )\dt < oo. (14.76) 



m=2 

Define the function 



1 if |*| < 1, 

ui-i-q otherwise, 



ft.(*)^^.,_i_„ ,''".' te 



By (14.74) it follows that \g(t)\ < (3 g u (t/T s ) for all t e K. Consequently, 

/OO 
gu(t/T s )g u (t/T s -m)dt 
-oo 

5u(r)5 u (T-m)dr, 

-OO 

and to establish (14.76) it thus suffices to prove 

00 /*CO 

Y, / ffu(r)<7 u (r-m)dT<oo. (14.77) 

^,—n J — OO 



m=2' 
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Since the integrand in (14.77) is symmetric around r = to/2, it follows that 



ffu(r)S'u(r-rn)dr = 2 / g u (r) 9 u {t - to) dr, (14.78) 

i/2 



and it thus suffices to establish 



oo />oo 

Y, / 5u(r)5 u (T-TO)dr<oo. (14.79) 

We next upper-bound the integral in (14.79) for every m > 2 by first expressing it 



where 



5u(r) ff u (r -m)dT = h + I 2 + h, 
,/2 



m-i x x 

/!> L/2 ri+«(TO-r)i+« dT ' 

/■ro+1 i 
r -~ / 7I+^ dT ' 

1 1 

dr. 



771 — 1 



+1 t 1+q (r- to) 1 



+ n: 



We next upper-bound each of these terms for to > 2. Starting with Ji we obtain 
upon defining t; = in — t 

h = j m/2 n+a( m _ T) i + « dT 

/•m/2 ^ ^ 

rm/2 -^ -^ 

' '■ \l+a Cl+a ^ 



1 



(m/2) l + a C + ° 



-2 1+a ^-(l- — ). ,„>2. 
a m +Q V to, c 



which is summable over m. As to I 2 we have 

■dr 



ro+1 i 

i ri+ Q 
2 

< -; rr; — , TO > 2, 

- (TO- 1)1+"' - ' 

which is summable over to. Finally we upper-bound .Z3 by defining £ = r — to 

1 -dr 



m+l T (T - TO) 

,1+a Al+ Q d ^ 



(^+TO) i+a e 
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d£ 



D 



We can now state (14.31) as a theorem. 



Theorem 14.6.4. Lei the pulse shape g: l-» 1 ka Borel measurable function sat- 
isfying the decay condition (14.17) for some positive a, (3, and T s . Let (Xg, IgZ) 
be a centered WSS SP of autocovariance function Kxx o,nd satisfying the bound- 
edness condition (14.16) for some 7 > 0. Then the stochastic process (14.73) is 
measurable and is of the power P given in (14.31). 

Proof. The measurability of (X(t), ( g R) follows from Proposition 14.6.2. The 
power can be derived as in the derivation of (14.31) from (14.27) with the derivation 
of (14.27) now being justifiable by noting that (14.25) follows from (14.71) and by 
noting that (14.26) follows from Lemma 14.6.3 and Fubini's Theorem. □ 

Similarly, we can state (14.37) as a theorem. 

Theorem 14.6.5 (Power in Bi-lnfinite Block-Mode PAM). Let (Dj, j £ Z) be 

LLD random bits. Let the (K, N) binary -to -reals encoder ewe: {0,1} K — => R N be 
such that enc(Z?i, . . . , Z?k) is of zero mean whenever the K-tuple (D\, . . . , Z?k) is 
uniformly distributed over {0, 1} K . Let {Xf,, IgZ) be generated from (Dj, j G Z) 
in bi-infinite block encoding mode using enc(-). Assume that the pulse shape g is a 
Borel measurable function satisfying the decay condition (14.17) for some positive 
a, (3, and T s . Then the stochastic process (14.73) is measurable and is of the 
power P as given in (14.37). 

Proof. Measurability follows from Proposition 14.6.2. The derivation of (14.37) is 
justified using Fubini's Theorem. □ 



14.7 Exercises 

Exercise 14.1 (Superimposing Independent Transmissions). Let the two PAM signals 
(X (1) (t)) and (X (2) (t)) be given at every epoch telby 

X (1) (t) = A (1) f; AfV^t-n;), X (2) (i) = A (2) JT X™g m (t-tT B ), 

^= — 00 £= — 00 

where the zero-mean real symbols (X t ) are generated from the data bits (ZL ) and 
the zero-mean real symbols (JQ ) from (D, ). Assume that the bit streams (D, ) and 
(Df ] ) are independent and that (X (1) (i)) and (X {1) (t)) are of powers P (1) and P (2) . 
Find the power in the sum of (X w (i)) and (X (1) (£)). 



242 Energy and Power in PAM 

Exercise 14.2 (The Minimum Distance of a Constellation and Power). Consider the 
PAM signal (14.47) where the time shifts of the pulse shape cj> by integer multiples of T s 
are orthonormal, and where the symbols (A^) are IID and uniformly distributed over the 
set {±|, ±^r, • ■ ■ , ±(2z/ — l)f }■ Relate the power in A(-) to the minimum distance d and 
the constant A. 

Exercise 14.3 (PAM with Nonorthogonal Pulses). Let the IID random bits (Dj, j e Z) 
be modulated using PAM with the pulse shape g:( h I{|t| < T s } and the repetition 
block encoding map i— ► (+1, +1) and 1 i— » (—1, —1)- Compute the average transmitted 
power. 

Exercise 14.4 (Non-IID Data Bits). Expression (14.37) for the power in bi-infinite block 
mode was derived under the assumption that the data bits are IID. Show that it need 
not otherwise hold. 

Exercise 14.5 (The Power in Nonorthogonal PAM). Consider the PAM signal (14.23) 
with the pulse shape g: 1 1— > I{|t| < T s }. 

(i) Compute the power in A(-) when {Xg, ) are IID of zero-mean and unit- variance, 
(ii) Repeat when [Xe) is a zero-mean WSS SP of autocovariance function 

'l m = 
Kxx(m) = i | \m\ = 1 , m e Z. 
[ otherwise 

Note that in both parts E[X e ] = and E[X*] = 1. 

Exercise 14.6 (Pre-Encoding). Rather than applying the mapping enc: {0, 1} K — > M N 
to the IID random bits D\, . . . ,Dx directly, we first map the data bits using a one-to-one 
mapping (f>: {0, 1} K — > {0, 1} K to D[, . . . , D' K , and we then map D[, . . . , D' K using enc 
toll,..., An . Does this change the transmitted energy? 

Exercise 14.7 (Binary Linear Encoders Producing Pairwise-lndependent Symbols). Bi- 
nary linear encoders with the antipodal mapping can be described as follows. Using a de- 
terministic binary KxN matrix G, the encoder first maps the row- vector d = (di, . . . ,dx) 
to the row- vector dG, where dG is computed using matrix multiplication over the binary 
field. (Recall that in the binary field multiplication is defined as0-0 = 0-l = l-0 = 0, 
and 1-1 = 1; and addition is modulo 2, so 9 = 1 9 1 = and 001 = 100 = 1). 
Thus, the f-th component Ce of dG is given by 

J (1.0 ^ J ( 2 .0 ST-, ,TN J (K.O 

The real symbol xe is then computed according to the rule 

x e =i f=l,...,N. 

[-1 if C£ = 1, 

Let X\, X 2 , . . . , An be the symbols produced by the encoder when it is fed IID random 
bits D\, D 2 , ■ ■ ■ , -Dk- Show that: 

(i) Unless all the entries in the l-ih column of G are zero, E[A^] = 0. 



14.7 Exercises 243 



(ii) X{ is independent of X t i if, and only if, the £-th column and the l'-Va column of G 
are not identical. 

You may find it useful to first prove the following. 

(i) If a RV E takes value in the set {0, 1}, and if F takes on the values and 1 equiprob- 
ably and independently of E, then E®F is uniform on {0, 1} and independent of E. 

(ii) If E\ and Ei take value in {0, 1}, and if F takes on the values and 1 equiprobably 
and independently of (Ei, E2), then E\ © F is independent of E-z- 

Exercise 14.8 (Zero-Mean Signals for Linearly Dispersive Channels). Suppose that the 
transmitted signal X suffers not only from an additive random disturbance but also 
from a deterministic linear distortion. Thus, the received signal Y can be expressed as 
Y = X * h + N, where h is a known (deterministic) impulse response, and where N is 
an unknown (random) additive disturbance. Show heuristically that transmitting signals 
of nonzero mean is power inefficient. How would you mimic the performance of a system 
transmitting X(-) using a system transmitting X(-) — c(-)? 

Exercise 14.9 (The Power in Orthogonal Code-Division Multi-Accessing). Suppose that 
the data bits [D, ) are mapped to the real symbols (X\ ) and that the data bits [D- ) 
are mapped to (X e ). Assume that 



(A 



^ hm -i— y eIix^) 2 } = P« 

T s l^oo 2L + 1 Z ^ L v l ' J 



and similarly for P' 2 '. Further assume that the time shifts of <fi by integer multiples of T s 
are orthonormal and that <fi satisfies the decay condition (14.46). Finally assume that 
(JQ ) and (X f ) are bounded in the sense of (14.16). Compute the power in the signal 



Y, [A W X^ + A m X™U{t- 2£T S ) + (a (1) X« - A (2) X< 2) Vi- (2€ + l)T, 



Exercise 14.10 (More on Orthogonal Code-Division Multi-Accessing). Extend the result 
of Exercise 14.9 to the case with rj data streams, where the transmitted signal is given by 

JT Ua {y ' 1) A {1) xf ) + ... + a {r >' 1) A M X<? ) )<j>(t-r 1 lT s ) 

e=-oo ^ 

+ • • • + (a™ A (1) X« + • • • + a^A^X^^t - ( V £ + r, - 1)T.)) 
and where the real numbers a v for 1, v £ {1, . . . , r\\ satisfy the orthogonality condition 



v 



A^)U', V )_U ift=t' 



/=i 



[0 11 b 7^ (, , 



The sequence a''' , . . . , a^' n ' is sometimes called the signature of the t-th stream. 

Exercise 14.11 (The Samples of the Self-Similarity Function). Let g: R — » R be of finite 
energy, and let R gg be its self-similarity function. 
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(i) Show that there exists an integrable nonnegative function G: [—1/2, 1/2) — > [0, oo) 
such that 

,1/2 

Rg g (mT s ) = / G(60e~' 2 dO, m £ Z, 

J -1/2 

and such that G(-0) = G(0) for all \0\ < 1/2. Express G(-) in terms of the FT of g. 

(ii) Show that if the samples of the self-similarity function are absolutely summable, 
i.e., if 

^|R g g(mT s )| < oo, 
mez 
then the function 

oo 

Ot-* E R gg (^T s )e i2 ™ S , 9e [-1/2, 1/2), 



m = — oo 



is such a function, and it is continuous, 
(iii) Show that if (Xe) is of PSD Sxx , then the RHS of (14.31) can be expressed as 

1 ., f 1/2 

-1/2 



-^A 2 f G(6)S xx (6)dO. 

's J -1/2 



Exercise 14.12 (A Bound on the Power in PAM). Let G() be as in Exercise 14.11. 

(i) Show that if [X(J is of zero mean, of unit variance, and has a PSD, then the RHS 
of (14.31) is upper-bounded by 

-^A 2 sup G(0). (14.80) 

T s -i/2<e<i/2 

(ii) Suppose now that G(-) is continuous. Show that for every e > 0, there exists a zero- 
mean unit-variance SP (Xi) with a PSD for which the RHS of (14.31) is within e 
of (14.80). 



Chapter 15 

Operational Power Spectral Density 

15.1 Introduction 

The Power Spectral Density of a stochastic process tells us more about the SP than 
just its power. It tells us something about how this power is distributed among 
the different frequencies that the SP occupies. The purpose of this chapter is to 
clarify this statement and to derive the PSD of PAM signals. Most of this chapter 
is written informally with an emphasis on ideas and intuition as opposed to math- 
ematical rigor. The mathematically-inclined readers will find precise statements 
of the key results of this chapter in Section 15.5. We emphasize that this chapter 
only deals with real continuous-time stochastic processes. 

The classical definition of the PSD of continuous-time stochastic processes (Defini- 
tion 25.7.2 ahead) is only applicable to wide-sense stationary stochastic processes, 
and PAM signals are not WSS. 1 Consequently, we shall have to introduce a new 
concept, which we call the operational power spectral density, or the op- 
erational PSD for short. 2 This new concept is applicable to a large family of 
stochastic processes that includes most WSS processes and most PAM signals. 
For WSS stochastic processes, the operational PSD and the classical PSD coin- 
cide (Section 25.14). In addition to being more general, the operational PSD is 
more intuitive in that it clarifies the origin of the words "power spectral density." 
Moreover, it gives an operational meaning to the concept. 

15.2 Motivation 

To motivate the new definition we shall first briefly discuss other "densities" such 
as charge density, mass density, and probability density. 

In electromagnetism one encounters the concept of charge density, which is often 
denoted by g(-). It measures the amount of charge per unit volume. Since the 



'If the discrete-time symbol sequence is stationary then the PAM signal is cyclostationary. 
But this term will not be used in this book. 

2 These terms are not standard. Most of the literature does not seem to distinguish between 
the PSD in the sense of Definition 25.7.2 and what we call the operational PSD. 
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Operational Power Spectral Density 



function 


quantity of interest 


per unit of 


charge (spatial) density 


charge 


space 


mass (spatial) density 


mass 


space 


mass line density 


mass 


length 


probability (per unit of X) density 


probability 


unit of X 


power spectral density 


power 


spectrum (Hz) 



Table 15.1: Various densities and their units 



charge need not be uniformly distributed, g(-) is typically not constant so the charge 
density is a function of location. Thus, we usually write g{x,y,z) for the charge 
density at the location (x,y, z). This can be defined differentially or integrally. 
The differential definition is 



g{x,y,z) 



lim 



Charge in Box {(x', y',z') : \x - x'\ < f,|y-j/'| < %,\z-z'\ < f } 

aTo Volume of Box \(x', y', z') : \x - x'\ < § ,\y - y'\ < §,\z-z'\ < f } 

Charge in box {(x ; , J/', z') : \x-x'\ < §,\y-y'\ < f ,|z-z'| < f } 



lim 

A|0 



A :i 



and the integral definition is that a function g(-) is the charge density if for every 
region DcR 3 



Charge in T> 



g(x, y, z) dxdydz, D C 



(x,i/,z)e© 



Ignoring some mathematical subtleties, the two definitions are equivalent. Perhaps 
a more appropriate name for charge density is "Charge Spatial Density," which 
makes it clear that the quantity of interest is charge and that we are interested in 
the way it is distributed in space. The units of g(x,y,z) are those of charge per 
unit volume. 

Mass density — or as we would prefer to call it, "Mass Spatial Density" — is analo- 
gously defined. Either differentially, as 



Q(x,y,z) 



lim 



Mass in Box {(a/, y' , z 1 ) : \x - x'\ < f, \y - y'\ < f , \z - z'\ < f } 
aTo Volume of Box {{x 1 , y' , z') : \x - x'\ < f ,\y-y'\ < f ,\z - z'\ < f } 

Mass in box {(a;', y', z') : \x - x'\ < f ,\y-y'\ < f,\z-z'\ < f } 
Aio A3 ' 



or integrally as the function g(x, y, z) such that for every subset D C 



Mass in T> 



g(x,y, z) dxdydz, T> C 



{x,y,z)eD 



The units are those of mass per unit volume. Since mass is nonnegative, the 
differential definition of mass density makes it clear that mass density must also 
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be nonnegative. This is slightly less apparent from the integral definition, but 
(excluding subsets of K 3 of measure zero) is true nonetheless. By convention, if 
one defines mass density integrally, then one typically insists that the density be 
nonnegative. 

Similarly, in discussing mass line density one envisions a one-dimensional object, 
and its density with respect to unit length is defined differentially as 

Mass in Interval {a/ : \x — x'\ < -w-} 
p(x) = lim - , 

y ' Alt) A 

or integrally as the nonnegative function g(-) such that for every subset T> C R of 
the real line 



Mass in V = j g{x) dx, Del. 

The units are units of mass per unit length. 

In probability theory one encounters the probability density function of a random 
variable X . Here the quantity of interest is probability, and we are interested in 
how it is distributed on the real line. The units depend on the units of X. Thus, if 
X measures the time in days until at least one piece in your new china set breaks, 
then the units of the probability density function fx{ m ) of X are those of probability 
(unit-less) per day. The probability density function can be defined differentially 
as 

Pr[X6(*-f,l+f)] 

IX{X) = && A — 

or integrally by requiring that for every subset £ C K 

Pr[X&£}=[ f x (x)dx, £cR. (15.1) 

Again, since probabilities are nonnegative, the differential definition makes it clear 
that the probability density function is nonnegative. In the integral definition we 
typically add the nonnegativity as a condition. That is, we say that fx(-) is a 
density function for the random variable X if fx{') is nonnegative and if (15.1) 
holds. (There is a technical uniqueness issue that we are sweeping under the rug 
here: if fx{) is a probability density function for X and if £(•) is a nonnegative 
function that differs from fx{) only on a set of Lebesgue measure zero, then £(•) 
is also a probability density function for X.) 

With these examples in mind, it is natural to interpret the power spectral density 
of a stochastic process (X(t), t 6 R) as the distribution of the power of X(-) 
among the different frequencies. See Table 15.1 on Page 246. Heuristically, we 
would define the power spectral density Sxx at the frequency / differentially as 

Power in the frequencies [/ — y, / + y] 



Alt) A 

or integrally by requiring that for any subset T> of the spectrum 



Power of X in V = / Sxx(/) d/, Del. (15.2) 

Jfev 
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To make this meaningful we next explain what we mean by "the power of X in 
the frequencies P." To that end it is best to envision a filter of impulse response h 
whose frequency response h is given by 

Hf) = I 1 if/GC ' (15-3) 

I otherwise, 

and to think of the power of -X'(-) in the frequencies T> as the average power at the 
output of that filter when it is fed X(-), i.e., the average power of the stochastic 
process X * h. 3 

We are now almost ready to give a heuristic definition of the power spectral density. 
But there are three more points we would like to discuss first. The first is that 
(15.2) can also be rewritten as 

Power of X in V = J \{f G V} S xx (f) df, Del. (15.4) 

J all frequencies 

It turns out that if (15.2) holds for all sets DcKof frequencies, then it also holds 
for all "nice" filters (of a frequency response that is not necessarily {0, 1} valued): 

PowerofX*h=/ \Hf)\ 2 Sxxif) df, h "nice." (15.5) 

J all frequencies 

That (15.4) typically implies (15.5) can be heuristically argued as follows. By 
(15.4) the set of frequency responses h for which (15.5) holds includes all frequency 
responses of the form h(f) = l{f G T>}. But if (15.5) holds for some frequency 
response h, then it must also hold for ah, where a is any complex number, because 
scaling the frequency response by a merely multiplies the output power by \a\ 2 . 
Also, if (15.5) holds for two responses hi and h2 for which 

h 1 (f)h 2 (f) = 0, /el, (15.6) 

then it must also hold for hi + I12, because Parseval's Theorem and (15.6) imply 
that X • hi and X • h.2 must be orthogonal. Thus, (15.6) implies that the power 
in X* (hi + 112) is the sum of the power in X * hi and the power in X * I12. It 
thus intuitively follows that if (15.4) holds for all subsets T> of the spectrum, then 
it holds for all step functions h(f) = ~^2 u ct u l{f G T) u }, where {2?„} are disjoint. 
And since any "nice" frequency response h can be arbitrarily well approximated 
by such step functions, we expect that (15.5) would hold for all "nice" responses. 

Having heuristically established that (15.2) implies (15.5), we prefer to define the 
PSD as a function S xx for which (15.5) holds, where "nice" will be taken to mean 
stable. 

The second point we would like to make is regarding uniqueness. For real stochastic 
processes it is reasonable to require that (15.5) hold only for filters of real impulse 
response. Thus we would require 

Power of X*h= / \h{f)\ 2 S xx {f) df, h real and "nice." (15.7a) 

J all frequencies 



3 We are ignoring the fact that the RHS of (15.3) is typically not the frequency response of a 
stable filter. A stable filter has a continuous frequency response (Theorem 6.2.11 (i)). 
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But since for niters of real impulse response the mapping / i— > |/i(/)| 2 is symmetric, 
(15.7a) can be rewritten as 

\Hf)\ 2 {Sxx(f) + Sxx(-f)) d/, h real and "nice." (15.7b) 

o 

This form makes it clear that for real stochastic processes, (15.7a) (or its equivalent 
form (15.7b)) can only specify the function / t— » Sxx{f) + Sxx( — /); it cannot fully 
specify the mapping / i— » Sxx(f)- For example, if a symmetric function Sxx 
satisfies (15.7a), then so does 

I otherwise, 

In fact, if Sxx satisfies (15.7a), then so does any function S(-) such that 

S(f) + S(-f) = Sxx(f) + Sxx(-f), f€R. 

Thus, for the sake of uniqueness, we define the power spectral density Sxx to be 
a function of frequency that satisfies (15.7a) and that is additionally symmetric. 
It can be shown that this defines Sxx (to within indistinguishability) uniquely. 
In fact, once one has identified a nonnegative function S(-) such that for any real 
impulse response h the integral 

/>oo 

s(/)|M/)| 2 d/ 

corresponds to the power in X * h, then the PSD Sxx of X is given by the sym- 
metrized version of S(-), i.e., 

Sxx(/) = ^(s(/) + S(-/)), /el. (15.8) 

Note that the differential definition of the PSD would not have resolved the unique- 
ness issue because a filter of frequency response / i— ► l{/ G \f — ^ , / + yl } is 
not real. 

The final point we would like to make is regarding additivity. Apart from some 
mathematical details, what makes the definition of charge density possible is the 
fact that the total charge in the union of two disjoint regions in space is the sum 
of charges in the individual regions. The same holds for mass. For the probability 
densities the crucial property is that the probability of the union of two disjoint 
events is the sum of the probabilities. Consequently, if T>\ and T> 2 are disjoint 
subsets of R, then Pr[X ePiU V 2 ] = Pr[X G Pi] + Pr[X G T> 2 \. Does this 
hold for power? In general the power in the sum of two signals is not the sum of 
the individual powers. But if the signals are orthogonal, then their powers do add. 
Thus, while Parseval's theorem will not appear explicitly in our analysis of the PSD, 
it is really what makes it all possible. It demonstrates that if T>\ , T> 2 C M. are disjoint 
frequency bands, then the signals X * hi and X * h 2 that result when X is passed 
through the filters of frequency response h\(f) = I{/ G Pi} and h 2 (f) = I{/ G T> 2 } 
are orthogonal, so their powers add. We will not bother to formulate this result 
precisely, because it does not show up in our analysis explicitly, but it is this result 
that allows us to define the power spectral density. 
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15.3 Defining the Operational PSD 

Recall that in (14.14) we denned the power P in a SP (Y(t), i € R) as 

P=lim^E / V'il)dl 



whenever the limit exists. Thus, the power is the limit, as T tends to infinity, of 
the ratio of the expected energy in the interval [— T, Tj to the interval's duration 2T. 
We define the operational power spectral density of a stochastic process as follows. 

Definition 15.3.1 (Operational PSD of a Real SP). We say that the continuous- 
time real stochastic process (X(t), t £ 1) is of operational power spectral 

density Sxx if \X(t), (€ l) is a measurable SP; the mapping Sxx '■ M. ^ M. is 
integrable and symmetric; and for every stable real filter of impulse response h € Cj 
the average power at the filter's output when it is fed (X(t), t € K) is given by 



/oo 
S X x(f)\kf)\ 2 df. 
-DC 



We chose our words very carefully in the above definition, and, in doing so, we 
avoided two issues. The first is whether every SP is of some operational PSD. 
The answer to that is "no." (But most stochastic processes encountered in Digital 
Communications are.) The second issue we avoided is the uniqueness issue. Our 
wording did not indicate whether a SP could be of two different operational PSDs. 
It turns out that if a SP is of two different operational PSDs, then the two are 
equivalent in the sense that they agree except possibly on a set of frequencies of 
Lebesgue measure zero. Consequently, somewhat loosely, we shall speak of the 
operational power spectral density of (X(t), t £ M.) even though the uniqueness is 
only to within indistinguishability. The uniqueness is a corollary to the following 
somewhat technical lemma. 

Lemma 15.3.2. 

(i) If s is an integrable function such that 

s(f)\h(f)\ 2 df = (15.9) 



for every integrable complex function h: R — » C, then s(f) is zero for all 
frequencies outside a set of Lebesgue measure zero. 

(ii) If s is a symmetric function such that (15.9) holds for every integrable real 
function h: K — > K, then s(f) is zero for all frequencies outside a set of 
Lebesgue measure zero. 

Proof. We begin with a proof of Part (i). For any A > and /o £ R define the 
function h : R — > C by 

h(t) = ^=l{\t\<^}e' 2 ^ t , teR. (15.10) 
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This function is in both Ci and £g- Since it is in £g, its self-similarity func- 
tion Rhh(i") is defined at every rGl. In fact, 



Rhh(r) 



(l - l^ij I{|r| < \}e' 2 *f° T , reR. (15.11) 



And since h G Ci, it follows from (11.35) that the Fourier Transform of Rhh 
is the mapping / i— > \h(f)\ 2 . Consequently by Proposition 6.2.3 (i) (with the 
substitution Rhh for g), the mapping / i— » \h(f)\ 2 can be expressed as the Inverse 
Fourier Transform of Rhh- Thus, by (6.9) (with the substitutions of s for x and Rhh 
for g), 



s(f)\h(f)\ 2 df= §(/)Rhh(/)d/. (15.12) 

) J — CO 

It now follows from (15.9), (15.12), and (15.11) that 



l/l \ :., r\„-'27rf f 



(l _ LLLj § (f) e™™ d/ = 0, A>0,/„e R. (15.13) 

Part (i) now follows from (15.13) and from Theorem 6.2.12 (ii) (with the substitu- 
tion of s for x and with the substitution of /o for t) . 

We next turn to Part (ii). For any integrable complex function h: R — » C, define 
Iir = Re(h) and hi = Im(h) so 



Consequently, 



\h K (f)\ 2 = ^ (|M/)| 2 + |M-/)| 2 + 2Re(M/) h(-f)j), /el 
M/)| 2 = ^(|M/)| 2 + |M"/)| 2 " 2Re(M/) &(-/))), /el, 

and 

|M/)|V|M/)| 2 = ^(|M/)| 2 + |M-/)| 2 ), /eK. (15.14) 

Applying the lemma's hypothesis to the real functions h^ and hi we obtain 

/CO 
s (/)|M/)| 2 d/, 
-co 

/•CO 

0= / s(/)M/)| 2 df, 
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and thus, upon adding the equations, 

/oo 
*(/)(|M/)| 2 +M/)| 2 )df 
-co 

-/ s(f)(\h(f)\ 2 + \h(-f)\ 2 )df 

|M/)| 2 d/ 



OO 



OO 

s(f)\h(f)\ 2 df, (15.15) 

-oo 

where the second equality follows from (15.14); the third by writing the integral 
of the sum as a sum of integrals and by changing the integration variable in the 
integral involving h{— /); and the last equality from the hypothesis that s is sym- 
metric. Since we have established (15.15) for every complex h: R — > C, we can now 
apply Part (i) to conclude that s is zero at all frequencies outside a set of Lebesgue 
measure zero. □ 

Corollary 15.3.3 (Uniqueness of PSD). // both Sxx and S xx (-) are operational 
PSDs for the real SP (X(t), t£K), then the set of frequencies at which they differ 
is of Lebesgue measure zero. 

Proof. Apply Lemma 15.3.2 (ii) to the function s: / i— » Sxx(f) — $'xx(f)- ^ 

As noted above, we make here no general claims about the existence of opera- 
tional PSDs. Under certain restrictions that are made precise in Section 15.5, the 
operational PSD is defined for PAM signals. And by Theorem 25.13.2, the oper- 
ational PSD always exists for measurable, centered, WSS, stochastic processes of 
integrable autocovariance functions. 

Definition 15.3.4 (Bandlimited Stochastic Processes). We say that a stochastic 
process \X(f), t € R) of operational PSD Sxx is bandlimited to W Hz if, except 
on a set of frequencies of Lebesgue measure zero, Sxx{f) is zero for all frequencies f 
satisfying \f\> W. 

The smallest W to which (X(t), t € M.) is limited is called the bandwidth of 
(X(t), teR). 



15.4 The Operational PSD of Real PAM Signals 

Computing the operational PSD of PAM signals is much easier than you might 
expect. This is because, as we next show, passing a PAM signal of pulse shape g 
through a stable filter of impulse response h is tantamount to changing its pulse 
shape from g to g * h: 

a^A^X e g{a-n a ))*h\(t) = A'%2x e (g*h)(t-n B ), teR. (15.16) 
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(For a formal statement of this result, see Corollary 18.6.2, which also addresses the 
difficulty that arises when the sum is infinite.) Consequently, if one can compute 
the power in a PAM signal of arbitrary pulse shape (as explained in Chapter 14), 
then one can also compute the power in a filtered PAM signal. 

That filtering a PAM signal is tantamount to convolving its pulse shape with the 
impulse response follows from two properties of the convolution: that it is linear 

(cm + /3v) * h = au * h + /3v * h 

and that convolving a delayed version of a signal with h is equivalent to convolving 
the original signal and delaying the result 

((<t>-> u(a-t )) *hVi) = (u*h)(t-t ), Mo G M. 
Indeed, if X is the PAM signal 

oo 

X(t) = A Y, X e g(t-n 8 ), (15.17) 

£=-oo 

then (15.16) follows from the calculation 

(X*h)(t)= ((a^A Y, X t g{a-tT s ))*h)(t) 

^ £=-oo ' ' 

oo />oo 

= A^I, /»(«) ff (t - s - *T B ) da 

£=-oo J -°° 

oc 

= A Y X e (g*h)(t-n B ), teR. (15.18) 

£=-oo 

We are now ready to apply the results of Chapter 14 on the power in PAM signals 
to study the power in filtered PAM signals and hence to derive the operational 
PSD of PAM signals. We will not treat the case discussed in Section 14.5.3 where 
the only assumption is that the time shifts of the pulse shape by integer multiples 
of T s are orthonormal, because this orthonomality is typically lost under filtering. 



15.4.1 (X£, £ G Z) Are Centered, Uncorrelated, and of Equal Variance 

We begin with the case where the symbols \X(, £ G Zj are of zero mean, uncor- 
related, and of equal variance a\. As in (15.17) we denote the PAM signal by 
(X(t), ( 6 I) and study its operational PSD by studying the power in X • h. 
Using (15.18) we obtain that X*h is the PAM signal X but with the pulse shape g 
replaced by g*h. Consequently, using Expression (14.33) for the power in PAM 
with zero-mean, uncorrelated, variance-cr^ symbols, we obtain that the power in 
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X * h is given by 



A 
Power in X *h = — it| ||g* h\\ p 



T s 



K/)ri^(/)i 2 d/ 



x '^i^niM/nM/. 



(15.19) 



Sxx(/) 



where the first equality follows from (14.33) applied to the PAM signal of pulse 
shape g*h; the second follows from Parseval's Theorem by noting that the Fourier 
Transform of a convolution of two signals is the product of their Fourier Transforms; 
and where the third equality follows by rearranging terms. From (15.19) and from 
the fact that / t—> \g(f)\ 2 is a symmetric function (because g is real), it follows 
that the operational PSD of the PAM signal (X(t), t £ E) when (X e , £ £ Z) are 
zero-mean, uncorrelated, and of variance a x is given by 



(15.20) 




15.4.2 (X e ) Is Centered and WSS 

The more general case where the symbols [Xi, t £ Z) are not necessarily un- 
correlated but form a centered, WSS, discrete-time SP can be treated with the 
same ease via (14.31) or (14.32). As above, passing X through a filter of impulse 
response h results in a PAM signal with identical symbols but with pulse shape 
g * h. Consequently, the resulting power can be computed by substituting g • h 
for g in (14.32) to obtain that the power in X * h is given by 



Power in X * h 



oo / a 2 oo 



— oo \ 's 



£ Kxx(m)e'^ mJ °\g(f)\ 2 |M/)| 2 d/ 



m— — oo 



Sxx(f) 



where again we are using the fact that the FT ofg*h is / t— > g(f) h(f). The 
operational PSD is thus 



(15.21) 



because, as we next argue, the RHS of the above is a symmetric function of /. 
This symmetry follows from the symmetry of |g(-)| (because the pulse shape g 
is real) and from the symmetry of the autocovariance function Kxx (because the 
symbols (X t , I £ Z) are real; see (13.12)). Note that (15.21) reduces to (15.20) if 




Kxx (m) 



' x 



I{ m = 0}. 
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15.4.3 The Operational PSD in Bi-lnfinite Block-Mode 

We now assume, as in Section 14.5.2, that the (K, N) binary-to-reals block encoder 
enc : {0, 1} K — > K N is used in bi-infinite block encoding mode to map the bi- 
infinite IID random bits \Dj, j € Zj to the bi-infinite sequence of real numbers 
{Xg, £ € Z), and that the transmitted signal is 

oo 

X(t) = A E X e g(t-ei s ), (15.22) 

£=-oo 

where T s > is the baud, and where g(-) is a pulse shape satisfying the decay 
condition (14.17). We do not assume that the time-shifts of g(-) by integer multiples 
of T s are orthogonal, or that the symbols [Xi, t € ZJ are uncorrelated. We do, 
however, continue to assume that the N-tuple enc(Z?i, . . . , £>k) is of zero mean 
whenever D±, . . . , Dk are IID random bits. 

We shall determine the operational PSD of X by computing the power of the signal 
that results when X is fed to a stable filter of impulse response h. As before, we note 
that feeding X through a filter of impulse response h is tantamount to replacing 
its pulse shape g by g * h. The power of this output signal can be thus computed 
from our expression for the power in bi-infinite block encoding with PAM signaling 
(14.38) but with the pulse shape being g * h and hence of FT / i— > g(f) h(f): 

,00 / A 2 n N s 

PowerinX*h= / ( A, £ £ E[X t X t ] jW-W. \ g{f) f ) f h (f)f d/. 



£=!£' = ! 



Sxx(f) 



As we next show, the underbraced term is a symmetric function of /, and we thus 
conclude that the PSD of X is: 



2 N N 

S xx (f) = ^ E E EPW e^'W-OT. \ g{ f)f, f e 

s £=ll' = l 



(15.23) 



To see that the RHS of (15.23) is a symmetric function of /, use the identities 

N N N N £-1 

E E ae - e ' = E ae - e + E E ( ai < £ ' + at ^ 
i=\i'=\ i=\ t=\t'=\ 

and EfX^Xf] = E[X^'X^] to rewrite the RHS of (15.23) in the symmetric form 

2 / N N £-1 \ 

NT 



A " ' E E [Xl] + E E 2 E[XtXt'] cos(2tt/(£ - £')T S ) \ \g(f)f 

\£=1 t=\ t' = \ ) 



From (15.23) we obtain: 

Theorem 15.4.1 (The Bandwidth of PAM Is that of the Pulse Shape). Suppose 
that the operational PSD in bi-infinite block-mode of a PAM signal (X(t)) is as 
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given in (15.23), e.g., that the conditions of Theorem 15.5.2 ahead are satisfied. 

Further assume 

N 

A > 0, Yl E i X e] > °> ( 15 - 24 ) 

e.g., that (X(fj\ is not deterministically zero. Then the bandwidth of the SP (X(t)) 
is equal to the bandwidth of the pulse shape g. 

Proof. If g is bandlimited to W Hz, then so is (X(t)), because, by (15.23), 

(0(/) = o)=*(Sjcc(/) = o). 

We next complete the proof by showing that there are at most a countable number 
of frequencies / such that Sxx{f) = but <?(/) 7^ 0. From (15.23) it follows 
that to show this it suffices to show that there are at most a countable number of 
frequencies / such that o~(f) = 0, where 

S £=ll' = l 

N-l 

E'\2irfmT s 

m=-N+l 

N-l 

= E ^ m . a , T . ( 15 - 25 ) 

m=-N+l 

and 

2 min{N,N+m} 

7 ™ = NT 5] E[X^_ m ], me{-N + l,...,N-l}. (15.26) 

^-max{l,m+l} 

It follows from (15.25) that o~(f) is zero if, and only if, e' 2w -' B is a root of the 
mapping 

N-l 
m=-N+l 

Since e l2 '*'^ Ts is of unit magnitude, it follows that o~(f) is zero if, and only if, e l27r ^ Tsi 
is a root of the polynomial 

2N-2 



z 1— » 

y=0 



5] 7,-n+i^. (15.27) 



From (15.26) and (15.24) it follows that 70 > 0, so the polynomial in (15.27) is 
not zero. Consequently, since it is of degree 2N — 2, it has at most 2N — 2 distinct 
roots and, a fortiori, at most 2N — 2 distinct roots of unit magnitude. Denote these 
roots by 



o , . . . , o , 



15.5 A More Formal Account 257 



where d < 2N - 2 and 0i,...,0 d e [-n, n). Since / satisfies e i27r/Ts = e' e if, and 
only if, 

J 2^T S ^ T s 

for some r\ € Z, we conclude that the set of frequencies / satisfying <r(f) = is the 
set 



27rTs + T; : ^ z i u - u te + T; : ^ z i' 

and is thus countable. (The union of a finite (or countable) number of countable 
sets is countable.) □ 



15.5 A More Formal Account 

In this section we shall give a more formal account of the power at the output of 
a stable filter that is fed a PAM signal. There are two approaches to this. The 
first is based on carefully justifying the steps in our informal derivation. 4 This 
approach is pursued in Section 18.6.5, where the results are generalized to complex 
pulse shapes and complex symbols. The second approach is to convert the problem 
into one about WSS stochastic processes and to then rely heavily on Sections 25.13 
and 25.14 on the filtering of WSS stochastic processes and, in particular, on the 
Wiener-Khinchin Theorem (Theorem 25.14.1). For the benefit of readers who have 
already encountered the Wiener-Khinchin Theorem we follow this latter approach 
here. We ask the readers to note that the Wiener-Khinchin Theorem is not directly 
applicable here because the PAM signal is not WSS. A "stationarization argument" 
is thus needed. 

The key results of this section are the following two theorems. 

Theorem 15.5.1. Consider the setup of Theorem 14-6.4 with the additional as- 
sumption that the autocovariance function Kxx of (xA * s absolutely summable: 

oo 

V \K xx (m)\ <oo. (15.28) 



Let h € Ci be the impulse response of a stable real filter. Then: 
(i) The PAM signal 

oo 

X:(w,t)i->A Y, X e {uj)g{t-£T S ) (15.29) 

fc-oo 

is bounded in the sense that there exists a constant T such that 

\X(u,t)\<T, (weO, teR). (15.30) 



4 The main difficulties in the justification are in making (15.16) rigorous and in controlling 
the decay of g * h for arbitrary h £ Ci . 
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(ii) For every lu € SI the convolution of the sample-path t <—* X(lu, t) with h is 
defined at every epoch. 

(Hi) The stochastic process 

(u,t)>-> x{uj,cr)h{t-cr)da, (u> € O, t € R\ (15.31) 

J — oo 

that results when the sample-paths o/X are convolved with h is a measurable 
stochastic process of power 

/.oo /,2 oo \ 

P = y U- E K^(m) e i2 ^ T =|5(/)| 2 JlM/)| 2 d/. (15.32) 



m— — oo 



Theorem 15.5.2. Consider the setup of Theorem 14-6.5. Let h £ C 1 be the impulse 
response of a real stable filter. Then: 

(i) The sample-paths of the PAM stochastic process 

oo 

X:(w,<)KA^I,(w)9(i-ff s ) (15.33) 

fc-oo 

are bounded in the sense of (15.30). 

(ii) For every lu G f2 the convolution of the sample-path t i— > X(lu, t) and h is 
defined at every epoch. 

(Hi) The stochastic process (X(t), t 6l)*h i/iai results when the sample-paths 
of X are convolved with h is a measurable stochastic process of power 

,00 / ,2 N N \ 

P = / ^FEE E ^^'] ei2T/( ^ )Ts l5(/)| 2 IM/)| 2 d/, (15.34) 

where {X\, ■ ■ ■ ,^n) = enc(l?i, . . . , Dk), and where D\, . . . , Dk are IID ran- 
dom bits. 

Proof of Theorem 15.5.1. Part (i) is a consequence of the assumption that (XA 
is bounded in the sense of (14.16) and that the pulse shape g decays faster than 1/t 
in the sense of (14.17). 

Part (ii) is a consequence of the fact that the convolution of a bounded function 
with an integrable function is defined at every epoch; see Section 5.5. 

We next turn to Part (iii). The proof of the measurability of the convolution of 
yX(t), t £ K) with h is a bit technical. It is very similar to the proof of Theo- 
rem 25.13.2 (i). As in that proof, we first note that it suffices to prove the result 
for functions h that are Borel measurable; the extension to Lebesgue measurable 
functions will then follow by approximating h by a Borel measurable function that 
differs from it on a set of Lebesgue measure zero (Rudin, 1974, Chapter 7, Lemma 1) 
and by then noting that the convolution of t t— ¥ X{uj,t) with h is unaltered when h 
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is replaced by a function that differs from it on a set of Lebesgue measure zero. We 
thus assume that h is Borel measurable. Consequently, the mapping from R 2 to K 
defined by (t, a) t— > h(t — a) is also Borel measurable, because it is the composition 
of the continuous (and hence Borel measurable) mapping (£, it) i— » t — a with the 
Borel measurable mapping t t— > hit). 

As in the proof of Theorem 25.13.2, we prove the measurability of the convolution 
of (X(t), t € R) with h by proving the measurability of the mapping defined by 
(u>, t) i— > (1 + t 2 )^ 1 J_ X{u>, a) h(t — a) da. To this end we study the function 

{^t),o)~ X{ "^ 2 - a \ (( W ,t)€fixR,(7eR). (15.35) 

This function is measurable because, as noted above, (£, it) i— » h(t — a) is measur- 
able; because, by Proposition 14.6.2, (X(t), t £ K) is measurable; and because the 
product of Borel measurable functions is Borel measurable (Rudin, 1974, Chap- 
ter 1, Section 1.9 (c)). Moreover, using (15.30) and Fubini's Theorem it can be 
readily verified that this function is integrable. Using Fubini's Theorem again, we 
conclude that the function 

1 f°° 
( w '*) ,_> TTla/ X{w,a)h{t-a)da 

is measurable. Consequently, so is X * h. 

To conclude the proof we now need to compute the power in the measurable (non- 
stationary) SP X*h. This will be done in a roundabout way. We shall first define 
a new SP X'. This SP is centered, measurable, and WSS so the power in X'*h can 
be computed using Theorem 25.14.1. We shall then show that the powers of X*h 
and X' * h are equal and hence that from the power in X' * h we can immediately 
obtain the power in X * h. 

We begin by defining the SP (X'(t), i€l) as 

X'(t)=X(t + S), teR, (15.36a) 

where S is independent of (X(t)) and uniformly distributed over the interval [0, T s ], 

S ~W([0,TJ). (15.36b) 

That (X'(t)) is centered follows from the calculation 
E[X'(t)] = E[X(t + S)} 

= -E[X(t + s)]ds 

JO 's 

= 0, 

where the first equality follows from the definition of (X'(t)); the second from the 
independence of (X(t)) and S and from the specific form of the density of S; and 
the third because (X(t)) is centered. That (X'(£)) is measurable follows because 
the mapping ((u,s),t) i— > X[oj,t + s) can be written as the composition of the 
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mapping Uui, s), t) >— » (w, i + s) with the mapping (u>, i) i— > X(w, £). And that it is 
WSS follows from the calculation 

E[X'(t)X'(t + r)} 

= E[X{t + S)X{t + S + T)] 



T, 



s ^0 



T 



-A" 



E[X(i + s)X(t + s + r)] ds 



J2 X e g{t + s-ei s ) J2 X t 9{t + s + T-£'T s 



d.s 



CO CO 



T„ 



CO CO 



A 2 ^ ^ E[AVQ<] / ff(t + *-^ B )ff(t + « + r-^T B )d8 



T„ 



£— — co t' ■=— co 

CO CO 



A ' E E K xx(e-e') g(t + S -£T s )9(t + S + T-£'T s )d S 



T 



£=— co m— — co 



A ' E E K **( m ) / <?(* + « -^~ s )<7(i + 5 + T-(£-m)T s )ds 



A 2 ^ K X x(m) ]T 



oo /•- rr s +T s +t 



m— — oo 



3(0 <?(£ + t + mT s ) d£ 



- — CO 



-£T s +t 



Y A 2 Y, K xx (m) / 5 (03(£ + T + mT s )d£ 
-A 2 JT K X x(m) R gg (mT s + r), r,t e R. 



(15.37) 



m— — co 



Note that (15.37) also shows that (X'(t)) is of PSD (as defined in Definition 25.7.2) 



Sx>x>(f) = ^- E Kxx(m) e i2 ^ mT =| 5 (/)| 2 , /G 



(15.38) 



m— — oo 



which is integrable by the absolute summability of Kxx- 

Defining (Y'(t), t £ R) to be (X'(i), (el)*hwe can now use Theorem 25.14.1 
to compute the power in (Y'(t), t € R): 



lim -— E 

T-^oo 2T 



-T 



(r'(t)) 2 di 



/•oo/ A 2 oo -. 

/ (r E K xx (m)e^/^|5(/)| 2 )lM/)| 2 d/. 



m— — oo 



To conclude the proof we next show that the power in Y is the same as the power 
in Y'. To that end we first note that from (15.36a) it follows that 



w S O, < s < T s , £ e 



(X'*h)((u;,s),£) = (X*h)(cj,i + s) : 
i.e., that 

y'((w,s),t) = Y(u>,t + s), (u> efl, 0< s<T s , t&R\. 



(15.39) 
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It thus follows that 

Y 2 {uj,t)dt< (r'((tj,s),i)) 2 di, (w e o, o < s < t s , t ell 

T ./-T-T s ^ ' 



(15.40) 



because 



T /-T 

(y'((^,s),t)) 2 di= / r 2 (cj,t + s)di 

T-T s J-J-J B 

T+s 

y 2 (w,CT)dcr 
-T-T s +s 

> / y 2 (w,cr)dfJ, < S < T s , 

where the equality in the first line follows from (15.39); the equality in the second 
line from the substitution a = t+ s; and the final inequality from the nonnegativity 
of the integrand and because < s < T s . 

Similarly 

T /-T-T s 

Y 2 {cu,t)dt> (Y'{{uj 1 s),t)) 2 dt, Lu en, <s<J s , teR), {15A1) 

because 



t-t s /-T-T B 

(y'((w,s),i)) 2 dt= / y 2 (cj,i + s)dt 

T-T s + s 

y 2 (^,o-)dcr 

'-T+s 

< / y 2 (cj,fj)dcr, < s < T s . 

Combining (15.40) and (15.41) and using the nonnegativity of the integrand we 
obtain that for every ui £ il and s € [0, T s ] 

T-T s pj /-T+T, 

(y'((o),s),i)) 2 dt< / Y 2 (w,a)da< / (y'((cj, s), t)) 2 di. (15.42) 

-T+T s J-T J-J-J B 

Dividing by 2T and taking expectations we obtain 



2T - 2T S 1 
2T 2T- 2T S 



T-T, 



T+T. 



(y'^) 2 dt 



i 



< 2T E 



/" y 2 (f7)dfj 

2T+ 2T S 1 
2T 2T+2T S 



T+T s 



T-T, 



(y'(*)) 2 dt 



(15.43) 



from which the equality between the power in Y' and in Y follows by letting T 
tend to infinity and using the Sandwich Theorem. □ 
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Proof of Theorem 15.5.2. The proof of Theorem 15.5.2 is very similar to the proof 
of Theorem 15.5.1, so most of the details will be omitted. The main difference is 
that the process (X'(t), t G R) is now defined as 

X'(t)=X(t + S) 
where the random variable S is now uniformly distributed over the interval [0, NT S ], 

S~W([0,NT B ]). 
With this definition, the autocovariance of (X'(t), tel) can be computed as 

Kjc'jc'(t) 

= E[X{t+S)X{t + T+S)} 



NT 



NT S 



E[X{t + s) X{t + t + s)} ds 



NT 

A 
NX 



NT S / oo 



J2 M(X„,t + s-i/NT s ) Yl u(X v >,t + T + 8-v'm a ))d8 



v= — oo 
2 /.NT S °° °° 



s JO 



Y^ XI E[ii(X ]/ ,t + s-i/NT s )M(X I/ «,f + T + *-!/'NT s )] ds 



A 
NX 



^ — — oo iy'— — oo 

2 /-NT, °° 



s J ^ = _ 00 

,2 /-NT. oo 



NT 



s JO 
2 /.oo 



V" E [u(X„, t + s - i/NT s ) m(X„, t + t + s - i/NT s )] ds 

— — OO 

DC 

X E[u(X ,£+s - i/NT s )!i(Xo,t + T + s- ^NT S )] ds 



A 

A 2 />CO 

NX 

A 



E[ U (X ,e)«(X ,e + r)] d£ 



N 



N 



J2 Xr, 9(Z - rjT s ) Y, X V 9(H + t- VX. 

rj—1 rj' — l 



NT 



s J-oo 
2 N N 

EE E ^^] Rgg (t+(»? -77')). *^e 



d£ 



77=1 77'=! 



where the third equality follows from (14.36), (14.39), and (14.40); the fifth follows 
from (14.43); the sixth because the N-tuples (X^, r\ € Z) are IID; the seventh by 
defining £ = £ + s; the eighth by the definition (14.40) of the function u(-); and the 
final equality by swapping the summations and the expectation. 

The process yX'(t)) is thus a WSS process of PSD (as defined in Definition 25.7.2) 



A 



2 N N 



s i=\ i>=\ 



(15.44) 
The proof proceeds now along the same lines as the proof of Theorem 15.5.1. □ 
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15.6 Exercises 

Exercise 15.1 (Scaling a SP). Let (Y(i)) be the result of scaling the SP (X(t)) by the 
real number a. Thus, Y(t) — aX(t) for every epoch f 6l. Show that if \X(t)) is of 
operational PSD Sxx, then (Y(t)) is of operational PSD / i— » a 2 Sxx(/)- 

Exercise 15.2 (The Operational PSD of a Sum of Independent SPs). Intuition suggests 
that if [X(t)) and (Y(t)) are centered independent stochastic processes of operational 
PSDs Sxx and Syy, then their sum should be of operational PSD / i— » Sxx(f) + Syy (/)■ 
Explain why. 

Exercise 15.3 (Operational PSD of a Deterministic SP). Let (X(t)) be deterministically 
equal to the energy- limited signal g: R — ► R in the sense that, at every epoch fel, the 
RV X(t) is deterministically equal to g(t). Find the operational PSD of (X(i)). 

Exercise 15.4 (Stretching Time). Let (X(t)) be of operational PSD Sxx, and let a > 
be fixed. Define the SP (Y(t)) at every epoch t e R as Y(t) = X(t/a). Show that (Y(t)) 
is of operational PSD / i— » aSxx( a f)- 

Exercise 15.5 (The Operational PSD is Nonnegative). Show that if (X(t), (Gl) is of 

operational PSD Sxx, then Sxx (/) must be nonnegative outside a set of frequencies of 
Lebesgue measure zero. Would this also have been true if we had not insisted that the 
operational PSD be symmetric? 

Hint: Proceed along the lines of the proof of Lemma 15.3.2. 

Exercise 15.6 (Operational PSD of PAM). Let (X e , I e Z) be IID with X e taking on 
the values ±1 equiprobably. Let 



ff(*) = l{l*l< y} 



y), teR, 
JG(£) = A Yj X e g(t-IT S ), tel, 

te-oo 

where A,T S > are deterministic. 

(i) Plot a sample function of Xi for a realization of (Xi, (. 6 Z) of your choice, 
(ii) Compute the operational PSD of Xi. 
(iii) Repeat Parts (i) and (ii) for 

oo 

X 2 (t) = A J2 X e 9{t-2£T S ), teR. 

(iv) How do the operational PSDs of Xi and X2 compare? 

Exercise 15.7 (Spectral Shaping via Precoding). Let (Xe, £ € Z) be IID with Xe taking 
on the values ±1 equiprobably. Let Xe — Xe + Xe-i for every I £ Z. 

(i) Compute the operational PSD of the PAM signal 

00 
Xi(t)= J2 Xe9(t-£J S ), f£l 

<=-oo 

for g(-) decaying to zero sufficiently fast as \t\ — > 00, e.g., satsifying (14.17). 
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(ii) Throw mathematical caution to the wind and evaluate your answer for the pulse 
shape whose FT is 

5(/) = i{l/l<2^}, /eR. 

(Ignore the fact that this pulse shape does not satisfy (14.17).) Plot your answer 
and compare it to the operational PSD of the PAM signal 

oo 

X 2 (t)= J2 X e g(t-£J S ), t€R. 

te-oo 

(iii) Show that Xi can also be written as a PAM signal with IID symbols but with a 
different pulse shape. That is, 

oo 

Xi(i)= J2 x e h(t-n s ), 

l=-oo 

h:t^g(t)+g(t-T s ). 

Exercise 15.8 (The Operational PSD and Block Codes). PAM is used in block-mode in 
conjunction with the (1,2) binary-to-reals block encoder 

Oh- (+1,-1), 1 i— (-1.+1) 

to transmit IID random bits. The pulse shape g(-) satisfies the decay condition (14.17). 
Compute the power and operational PSD of the signal. 

Exercise 15.9 (Repetitions and the Operational PSD). Let (X(t)) be the signal (15.22) 
that results when the (1, 2) binary-to-reals block-encoder (10.4) is used in bi-infinite block- 
mode. Find the operational PSD of [X(t)). 

Exercise 15.10 (Direct-Sequence Spread-Spectrum Communications). This problem is 
motivated by uncoded Direct- Sequence Spread-Spectrum communications with process- 
ing gain N . Let the (1, N ) binary-to-reals block encoder map to the sequence oi, . . . , an 
and 1 to — oi, • • • , — «n • Consider PAM with bi-infinite block encoding with this map- 
ping. Express the operational PSD of the resulting PAM signal in terms of the sequence 
Oi, . . . , <in and the pulse shape g. Calculate explicitly when the pulse shape is the map- 
ping t i— > I{|i| < T s /2} for two cases: when the sequence oi, . . . , On is the Barker-7 code 
(+1,4-1, +1,-1,-1, +1,-1) and when it is the sequence (+1, +1, +1, +1, +1, +1, +1). 
Compare the latter case with the case where the mapping is the antipodal mapping 
I— > +1, and 1 i— » — 1, the baud period 7T S , and the pulse shape is t i— » I{|i| < 7T s /2} 



Chapter 16 

Quadrature Amplitude Modulation 

16.1 Introduction 

We next discuss linear modulation in passband. We envision being allocated band- 
width W around the carrier frequency f c , so we can only send real signals whose 
Fourier Transform is zero at frequencies / satisfying |/| — / c > W/2. That 
is, the FT of the transmitted signal is allowed to be nonzero only in the fre- 
quency interval [f c — W/2,/ c + W/2] and in its negative frequency counterpart 
[—f c — W/2,— f c + W/2] (Definition 7.3.1). We assume throughout this chapter 
that 

W 
fc > y (16-1) 

There are numerous ways to communicate in passband and, to complicate things 
further, sometimes seemingly different approaches lead to identical signals. Thus, 
while we would like to motivate the scheme we shall focus on — Quadrature Ampli- 
tude Modulation (QAM) — we cannot prove or claim that it is the only "optimal" 
solution. 1 Nevertheless, we shall try to motivate it by discussing some features 
that one would typically like to have and by then showing that QAM has these 
features. 

From our studies of PAM we recall that if we are allocated (baseband) band- 
width W Hz and if T s > 1/(2W), then we can find a bandwidth- W pulse shape 
whose time shifts by integer multiples of T s are orthonormal. If T s = 1/(2W), then 
such a pulse is the bandwidth-W unit-energy pulse 1 1— > V2Wsinc(2W£). (You may 
recall that such pulses are rarely used because they decay to zero too slowly over 
time, thus rendering the computation of the PAM signal unstable and the resulting 
peak power unbounded.) And if T s < 1/(2W), then no such pulse shape exists. 
(Corollary 11.3.5.) 

From a somewhat more abstract perspective, PAM with the above pulse shape (or 
with the square root of a raised-cosine pulse shape (11.29) with very small excess 



1 There are information theoretic considerations that show that QAM can achieve the capacity 
of the bandlimited passband additive white Gaussian noise channel. 
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bandwidth) allows us to send symbols arriving at rate 

real symbol 
second 

as the coefficients in a linear combination of orthonormal signals whose bandwidth 
does not exceed (or only slightly exceeds) 

|[H,]. 

That is, for each spectral sliver of 1 Hz at baseband we obtain 2 real dimensions 
per second, i.e., we can communicate at spectral efficiency 

[real dimension/sec] 



[baseband Hz] 

This is an achievement that we would like to replicate for passband signaling: 

First Objective: Find a way to transmit real symbols arriving at rate R s real sym- 
bols per second as the coefficients in a linear combination of orthonormal passband 
signals occupying a (passband) bandwidth of WHz around the carrier frequency / c , 
where the bandwidth W is equal to (or only slightly exceeds) R s /2. That is, we 
would like to find a communication scheme that would allow us to communicate at 

[real dimension/sec] 
[passband Hz] 

Equivalently, since any stream of real symbols arriving at rate R s real symbols 
per second can be viewed as a stream of complex symbols arriving at rate R s /2 
complex symbols per second (simply by pairing tuples (a, b) of real numbers a, b € R 
into single complex numbers a + \b), we can restate our objective as follows: find 
a way to transmit complex symbols arriving at rate R s /2 complex symbols per 
second as the coefficients in a linear combination of orthonormal passband signals 
occupying a (passband) bandwidth of WHz around the carrier frequency f c , where 
the bandwidth W is equal to, or only slightly exceeds R s /2. That is, we would like 
to find a communication scheme that would allow us to communicate at 



[complex dimension/sec] 
[passband Hz] 



(16.2) 



In addition, we would like our modulation scheme to be of reasonable complexity. 
One of the benefits of the baseband PAM scheme is that we can compute all the 
inner products required to reconstruct the coefficients (symbols) using the matched 
filter by feeding it with the transmitted signal and sampling its output at the 
appropriate times. 

A naive approach that does not achieve our objective is to use real baseband PAM 
of the type we studied in Chapter 10 and to up-convert the PAM signal to passband 
by multiplying it by the mapping t t— > cos(27r/ c £). The problem with this approach 
is that the up-conversion doubles the bandwidth (Proposition 7.3.3). 
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16.2 PAM for Passband? 

A natural approach to passband signaling might be to consider PAM directly with- 
out any up-conversion. We merely have to look for a pulse shape (p whose Fourier 
Transform is zero outside the band |/| — / c < W/2 and whose self-similarity 
function R<^ is a Nyquist Pulse. It turns out that with this approach we can only 
achieve our objective if 4/ c T s is an odd integer. Indeed, the reader is encouraged 
to use Corollary 11.3.4 to verify that if a pulse 4> is an energy-limited passband 
signal that is bandlimited to W Hz around the carrier frequency / c , and if its time 
shifts by integer multiples of T s are orthonormal, then 

~ 2W 
with equality being achievable only if both 

l^(/)| 2 = T s l{||/|-/ c |<W/2} 

(for all frequencies / £ K outside a set of Lebesgue measure zero) and 

4/ c T s is an odd integer. (16.3) 

In fact, it can be shown that if (16.3) is satisfied and if ip is any energy-limited 
signal that is bandlimited to W/2 Hz and whose time shifts by integer multiples 
of 2T S are orthonormal, then the passband signal 

4>{t) = a/2cos(27t f c t) tp(t), t e R 

is an energy- limited passband signal that is bandlimited to W Hz around the carrier 
frequency / c , and its time shifts by integer multiples of T s are orthonormal. 

It would thus seem that if (16.3) is satisfied, then PAM would be a viable solution 
to our problem. Nevertheless, this is not the standard solution. The reason may 
have to do with implementation. If the above approach is used, then the carrier 
frequency influences the choice of the pulse shape. Thus, a radio with a selectable 
carrier frequency would require a different pulse shape for each frequency! More- 
over, the implementation of the modulator becomes carrier-dependent and fairly 
complex. This discussion motivates our second objective: 

Second Objective: To allow for flexibility in the choice of the carrier, it is desir- 
able to decouple the pulse shape selection from the carrier frequency. 

16.3 The QAM Signal 

Quadrature Amplitude Modulation achieves both our objectives. It achieves our 
desired spectral efficiency (16.2) and also decouples the signal design from the 
carrier frequency. It is easiest to describe QAM by describing the baseband repre- 
sentation xbb{') of the transmitted passband signal o;pb( - )- Indeed, the baseband 
representation of the transmitted signal has the structure of PAM but with one 
important difference: we allow for complex symbols and for complex pulse shapes. 2 



Allowing complex pulse shapes is not critical. Crucial is that we allow complex symbols. 
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In QAM the encoder 



ip: {0,l} fc ^C r ' 



(16.4) 



maps fc-tuples of data bits (D\, . . . , Dk) to n-tuples of complex symbols {C\, . . . , C'„ 
and the baseband representation of the transmitted signal is 



x BB (t) = AY,c e g(t-ei s ), te 



(16.5a) 



where the pulse shape g(-) may be complex (though it is often chosen to be real), 
A > is a real constant, T s > is the baud period, and 1/T S is the baud rate. The 
rate of the encoder is given by 



k 

n 



bit 



complex symbol 
and the transmitted real passband QAM signal -Xpb(-) is given by 



X PB (t) = 2Re(X BB {t) 



J2-*f c t\ 



te 



Using (16.5a) & (16.5c) we can also express the QAM signal as 



X PB (t) -- 


= 2Re(AY,C i 9(t-lT s )e' 2 ^ t \ 


tem.. 



Alternatively, we can use the identities 

Ke(wz) = Re(w) Re(z) — Im(w) lui(z), w, z € 

Im(2;) = — Re(iz), z£C 
to express the QAM signal as 

m,i(t) 



X PB (t) = V2A JT Re(Q) 2 Re ( ±= g(t - £T S ) e i2 ^* j 



9l,«,BB(t) 

5Q,f(*) 



where we define 



V2A V Im(C/) 2 Re ( \^= g{t - fT s ) e i27r/ct j . 

i=\ VI 2 / 

9q,(,bbW 

g u (t) 4 2Re(-j=9(t - ll s ) e i2 ^A 

= 2Re(g he . BB (t)e' 2 *fc t ), t&R, 



(16.5b) 



(16.5c) 



(16.6) 



teR, (16.7) 



(16.8a) 
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and 

g Q ,e(t) 4 2 Re (i^7=9{t - £T S ) e a ^A (16.8b) 

= 2Re(g QABB (t)e^^ t ), t&R, 

with corresponding baseband representations: 

gi A BB(t)=^9(t-£T s ), tel, (16.9a) 

1 

'71 

Some comments about the QAM signal: 



9QABB(t) = \^9(t-lT s ), tel. (16.9b) 



(i) The representation (16.7) demonstrates that the QAM signal is a linear com- 
bination of the waveforms {gi/} and {gQ,^}, where the coefficients are pro- 
portional to the real parts and the imaginary parts of the symbols {Ci}. 

(ii) The normalization factor of l/v2 in the definition of the functions {gi.^} and 
{gQ^} is for convenience only. Its role will become clearer in Section 16.5, 
where the pulse shape is chosen to be of unit energy. In this case the factor of 
l/v2 guarantees that the functions {gi,£} and {gQ,e} are also of unit energy. 

(iii) We could also view QAM slightly differently as a modulation scheme where 
data bits D\, . . . , D^ are mapped to 2n real numbers X\, . . . , X 2n , which are 
then grouped in pairs to form the n complex numbers Ci = X21-1 + \X 2 i 
for £ = 1, . . . , n and where these complex numbers are then mapped into the 
passband signal whose baseband representation is given in (16.5a). The two 
views are, of course, completely equivalent. 

The expression for the QAM signal Ape(-) is simplified if the pulse shape g is real. 
In this case we obtain from (16.6) for every t£K 



X PB {t) = 2 A Y, MCe) 9{t - £T S ) cos{2tt f c t) 
£=1 

n 

-2A^2lm{C e )g{t-n s )sm{2irf c t), g real. (16.10) 

£=1 

Thus, if the pulse shape g is real, then the QAM signal can be viewed as the 
sum of two signals: the first is the result of feeding {Yle(Ci)} to a baseband PAM 
modulator of pulse shape g and multiplying the result by cos(27r/ c i), and the second 
is the result of feeding {Im(CV)} to a baseband PAM modulator of pulse shape g 
and multiplying the result by — sin(27r/ c t). Figure 16.1 illustrates the generation 
of the QAM signal when the pulse shape g is real. 
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Re(CV 



Re(- 



{Ci}- 



Im(. 



lm(Ce 



PAM 



A J2 e Re(C e )g(t - £J S ) A^ ( Re(C e )g(t - £T S ) cos(2tt f c t) 
*•© 



cos(27r/ c i) 



90° 



PAM 



-sin(27r/ c i) 



Kx> 



<& 



x PB (t)/2 



A £, lm(C e )g(t - £J S ) w - A £ lm(C t )g(t - £J a ) sin(2n f c t) 



Figure 16.1: Generating a QAM signal when the pulse shape g is real. 



16.4 Bandwidth Considerations 

Recalling that the bandwidth of a passband signal around the carrier frequency is 
twice the bandwidth of its baseband representation (Proposition 7.6.7 and Theo- 
rem 7.7.12 (i)) we conclude: 

Note 16.4.1. If the pulse shape g is bandlimited to W/2 Hz, then the QAM signal 
(16.6) is bandlimited to W Hz around the carrier frequency f c . 

If the pulse shape g is real, then these bandwidth considerations can also be ex- 
plained in another way. We note that if g(-) is bandlimited to W/2 Hz then 
the signal J^Re(Cy 9(t — £J S ) is also bandlimited to W/2 Hz, so when it is up- 
converted by multiplication by cos(2tt f c t) the resulting signal is bandlimited to W 
Hz around the carrier frequency f c (Proposition 7.3.3). A similar argument holds 
for the signal that is multiplied by — sin(27r/ c t). 



16.5 Orthogonality Considerations 

We next study the consequences of choosing the pulse shape g{) so that its time 
shifts by integer multiples of T s be orthonormal. As in our treatment of PAM, we 
change notation and denote the pulse shape in this case by <p{')- The orthonormal- 
ity condition is thus 



4>{t - ll s ) 4>*(t - £'l s )dt = l{£ = £'}, £,£' e Z. 



(16.11) 
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By Corollary 11.3.4, this is equivalent to requiring that 

£=-oo 



2 

£ K / + t:)I =Ts ' (lfU2) 



for all frequencies / outside a set of Lebesgue measure zero. 

When the pulse shape satisfies the orthogonality condition (16.11) we refer to 1/T S 
as having units of complex dimensions per second. In analogy to Definition 11.3.6, 
we define the excess bandwidth as 

-_ / bandwidth of 4> \ , 

100% ( 1/(2T.) - 1 )' (103) 

Proposition 16.5.1. // the energy-limited pulse shape <p satisfies (16.11), then the 
QAM signal Xpb(-) can be expressed as 

n n 

XpB = \ / 2A^Re(Q)V M + V2A^Im(C £ )V>Q,£ (16.14) 

i=i e=i 

where 

■■■, ll>I,-l,ll>Q,-l,ll>I,0,ll>Q,0, 1pI,l,1pQ,l, ■ ■ ■ 
are orthonormal functions that are given by 



■il> 1/ :t^2Re(-^=<i){t-£T s )e' 2 " lf A, £e 



(16.15a) 



V>q,£: tn» 2ReM^=0(t-£T s )e i27r/c * ), £ e Z. (16.15b) 

Proof. Substituting <p for g in (16.7) we obtain 

■4>i,i(t) 



n , 

X PB {t) = V2AVRe(C,)2Re — ^{t - £T S 
,-■> \v2 



e i27T/ c i 



4>i,e,BB(t) 



+ \/2A V Im(C £ ) 2 Re( i-= 4>{t - £J S ) e i2,r/c * ) , t e R, 
i=i \ 1 2 ' 

■0Q,£,Bb(*) 

where for every t€K 

^i,/(t) = 2 Re ( -J= <f>(t - £T S ) e i2 ^M (16.16a) 

= 2Re(V> M 3B(i)e i2 ^*), 
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V Q ,/(t) = 2 Re ( \-j= (t>(t - £J S ) e i2 ^* 

= 2Re(V> QA BB(i)e i2 " /c '), 
and the baseband representations are given by 

V2 



(16.16b) 



and 



ipQ,i,BB{t) = \—, = 4>(t - ^T s 



(16.17a) 
(16.17b) 



We next verify that, when satisfies (16.11), the functions 

• • • ,^1,-lAo, ,-1,^1,0, V'Q.O, ^1,1. ^Q,l, • • • 

are orthonormal. To this end we recall that the inner product between two real 
passband signals is twice the real part of the inner product between their baseband 
representations (Theorem 7.6.10). For £ ^ £' we thus have by (16.11) 



(il>i,e,*l>i,e') = 2Re((Vi,£,BB,V'i,£',BB)) 



= 2 Re 

= 0, 

(ipQ,e,*l>Q,£>) = 2 Re 
= 2 Re 

= 0, 



and 



(i/>i,e,il>Qj>) = 2 Re 
= 0. 
And for £ = £' we have, again 

(il>i,e,*l>i,e) =2 Re 
= 1, 



1 
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t h-> — <£(i-£T s ),i 



i*(*-T.))) 



V2 



(tpQ,e,BB,1pQ,t',BB)) 

(t«il#- £T s ),i ^ i-L 0(t - £'T S )) 



(i -> -j= <f>(t - n B ),t ^\^=<f>(t- £%))) 



V2 



by (16.11), 

(t ~ -^ 0(< - £T B ),t -> -^ 0(< - £T S ))) 



{ih.,l,*l>Q,t) = 2 Re 

= Re(-i 
= 0, 



(* -» -4 <K* " a),t -» ii 0(< " *T»))) 



V2 
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and 

(^q,/,^q,/> = 2Re((* ^ \-±= 4>{t - n B ),t~ i-^= 0(t - fT B ))) 

= 1. n 

Notice that (16.14)-(16.15) can be simplified when </> is real: 

Corollary 16.5.2. //, in addition to the assumptions of Proposition 16.5.1, we also 
assume that the pulse shape <fi is real, then the QAM signal can be written as 

n 

X PB (t) = V2A^2Re(C e ) V2<f>{t - ll s ) cos(2ir f c t) 

i=i 

n 

-\/2A^Im(C£)\/2^(i-£T s )sin(27r/ c i), t e R, (16.18) 

e=i 

and 

(tn V2(j)(t-n s )cos(2irf c t)\ , \t^ V2(j)(t-n s )sm(2Trf c t)\ 

[■ J £= — oo I J 1= — oo 

are orthonormal. 



16.6 Spectral Efficiency 

We next show that QAM achieves our spectral efficiency objective. We assume 
that we are only allowed to transmit signals of bandwidth W around the carrier 
frequency f c , so the transmitted signal can only occupy the frequencies / satisfying 

||/|-/c| <W/2. 

In order for the QAM signal to meet this constraint, we choose a pulse shape <fi 
that is bandlimited to W/2 Hz, because the up-conversion doubles the bandwidth 
(Note 16.4.1). Thus, by Corollary 11.3.5, the orthogonality (16.11) can only hold 
if the baud period T s satisfies T s > 1/(2 x W/2) or 

T ^w< 

with the RHS being achievable by choosing to be the bandwidth- W/2 unit-energy 
signal t i-> VWsinc(Wt). 

If we choose T s equal to 1/W (or only slightly larger than that), then our modulation 
will support the transmission of complex symbols arriving at a rate of 1/T S « W 
complex symbols per second. And since our QAM signal only occupies W Hz 
around the carrier frequency, our scheme achieves a spectral efficiency of 1 [complex 
dimension per second] per Hz. QAM thus achieves our spectral efficiency objective. 
This is so exciting that we highlight the achievement: 
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QAM with the bandwidth- W/2 unit-energy pulse shape given by 
t i— > VWsinc(Wt) transmits a sequence of real symbols arriving at 
a rate of 2W real symbols per second as the coefficients in a linear 
combination of orthogonal signals, with the resulting waveform 
being bandlimited to W Hz around the carrier frequency f c . It 
thus achieves a spectral efficiency of 

[real dimension/sec] [complex dimension/sec] 



[passband Hz] [passband Hz] 



16.7 QAM Constellations 

In analogy to the definition of the constellation of a PAM scheme (Section 10.8), 
we define the constellation of a QAM scheme (or, perhaps more appropriately, of 
the mapping tp(-) in (16.4)) as the smallest subset of C of which Cg is an element 
for every I 6 {1, ...,n} and for every realization of the data bits. We denote 
the constellation by C. The number of points in the constellation C is just the 
number of elements of C. 

Important constellations include the square 4-QAM constellation (also knows as 
QPSK) 

{+l + i,-l + i,-l - i,+l - i}, 

the square QAM constellation with (2u) x (2v) points 

{a + 16 : a, b e {-{2v - 1), ... , -3, -1, +1, +3, . . . , (2i/ - 1)}}, (16.19) 

and the M-PSK (M-ary Phase Shift Keying) constellation comprising the M com- 
plex numbers on the unit circle whose M-th power is one, i.e., 



{'• 



i27r/M i4ir/M i6vr/M i(M-l)27r/M 

6 , c , C , . . . , C 



See Figure 16.2 for some common QAM constellations. Please note that the square 
16-QAM and the 16-PSK are just two of many possible constellations with 16 
points. However, some engineers omit the word "square" and write 4-QAM, 16- 
QAM, 64-QAM, etc. for the respective square constellations. 

We can also define the minimum distance 6 of a constellation C in analogy to 

(10.21) as 

5= min \c-c'\. (16.20) 

c,c'ec 

In analogy to (10.23), we define the second moment of a constellation C as 

^En 2 - ( 16 - 21 ) 

w cec 



16.8 Recovering the Complex Symbols via Inner Products 



275 



4-QAM 


1 


, 






• 


• 






• 


• 





16-QAM 



8-PSK 




32-QAM 



Figure 16.2: Some QAM constellations (drawn to no particular scale). 



16.8 Recovering the Complex Symbols via Inner Products 

Recall that, by Proposition 16.5.1, if the time shifts of <j) by integer multiples of T s 
are orthonormal, then the QAM signal can be written as 



Xp B = V2A J^ MCi) V>M + V2A ]T Im(C/) ^Q, 



£=1 



(:=1 



where the signals . . . ,tpi t -i, t/>q,_i, ^1,0) V'QjOi '0i,1i ^Q^i ■ ■ -i which are given in 
(16.15), are orthonormal. Consequently, the complex symbols can be recovered 
from the QAM signal (in the absence of noise) using the inner product: 



Re{d 



Im(C/) 



1 



^2A 
1 

V2A 



(X pb ,Vm), i€{l,...,n}, 
(Xpb,^q,/>, £e{l,...,n}. 



(16.22a) 
(16.22b) 



We next describe circuits to compute these inner products. With a view to future 
chapters where noise will be present, we shall describe more general circuits that 
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compute the inner products (r,ipi t i) and (r, i/)q/) for an arbitrary (not necessarily 
QAM) energy-limited signal r. Moreover, since the calculation of the inner products 
will not exploit the orthogonality condition (16.11), we shall describe the more 
general setting where the pulse shape is arbitrary and refer to the notation of 
(16.7). Thus, we shall present circuits to compute 

(r,gi,f),(r,gQ,£> , 

where gii and gQ.^ and their baseband representations are given in (16.8) and 
(16.9). Here r is an arbitrary energy-limited signal. We present two approaches: 
an approach based on baseband conversion and a direct approach. 

16.8.1 Inner Products via Baseband Conversion 

We begin by noting that if the pulse shape g is bandlimited to W/2 Hz then both 
gi^ and gQ£ are bandlimited to W Hz around the carrier frequency f c . Conse- 
quently, since they contain no energy outside the bands [f c — W/2, f c + W/2] and 
[— f c — W/2, — / c + W/2], it follows from Parseval's Theorem that the Fourier Trans- 
form of r outside these bands does not influence the value of the inner products. 
Thus, if s is the result of passing r through an ideal unit-gain bandpass filter of 
bandwidth W around the carrier frequency f c , i.e., 

s = r*BPF w ,/ c , (16.23) 

then 

<r,g w ) = (s,g M ), (16.24a) 

(r,g Q/ ) = (s,g Q/ ). (16.24b) 

If we denote the baseband representation of s by Sbb , then 
(r,gi,<?) = (s,gi,f) 

= 2Re((s B B,gM,BB» 

= \/2Re«s B B, t^g{t-n s ))), (16.25a) 

where the first equality follows from (16.24a); the second from Theorem 7.6.10; 
and the final equality from (16.9a). Similarly, 

(r,g Q ,£> = (s,gQ,*) 

= 2Re((s BB ,gQ,£3B>) 

= V2Re((s BB ,t^\g{t-n s ))) 

= V2lm({s B B, t ^ g(t - ei s ))). (16.25b) 

We next describe circuits to compute the RHS of (16.25a) & (16.25b). The circuit 
to produce Sbb from s was already discussed in Section 7.6 on the baseband rep- 
resentation of passband signals (Figure 7.11). One multiplies s(t) by e _l27r ^ c * and 
then passes the result through a lowpass filter whose cutoff frequency W c satisfies 

W W 

Y <W c <2/ c - y , 
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r(t) 



BPF 



w,/ c 



s(t) 






LPF V 



cos(2nf c t) 



<W C < 2/ c 



-Re(sBB) 



!)() = 



V 

4y 



LPF V 



-Im(s B E 



Figure 16.3: QAM demodulation: the front-end. 

i.e., 

sbb =(*:-> s(i)e- i2 ^*) *LPF Wc , 

or, in terms of real operations: 

Re(s BB ) = (tn s(i) cos(27r/ c i)) *LPF Wc , 

Im(s BB ) = ~(t h-> s(» sin(27r/ c i)) *LPF Wc . 

This circuit is depicted in Figure 16.3. Notice that this circuit depends only on 
the carrier frequency f c and on the bandwidth W; it does not depend on the pulse 
shape. 

Once s BB has been computed, the calculation of the inner products on the RHS of 
(16.25a) & (16.25b) is straightforward. For example, to compute the inner product 
on the RHS of (16.25a) we note that from (16.25a) 



(r,gi, £ ) = \/2Re / s BB (t) g* (t - £J S ) dt 

\J-oc J 

/oo 
Re(« BB (*)) Re(g{t - £T S )) dt 
-CO 
/>CO 

+ V2 Im(s BB {t)) Im(g{t - £T S )) dt, (16.26) 



where the terms on the RHS can be computed by feeding Re(s BB ) to a matched 
filter matched to Re(g) and sampling the filter's output at time £T S 



Re(s BB (t)) Re(ff(* " ^)) dt = (Re(s BB ) *Re(g))(fT s 



(16.27) 
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and by feeding Im(sBB) to a matched filter matched to Im(g) and sampling the 
filter's output at time €\ s 

/oo 
Im(s B B(i)) lm(g(t - £T S )) dt = (lm(s BB ) *MD) (H B ). (16.28) 

-oo 

Similarly, to compute the inner product on the RHS of (16.25b) we note that from 
(16.25b) 

(r,g Q/ ) = V2Im(j s BB {t)g*{t-n s )dt) 

/OO 
lm(s BB (t))Re(g{t-n s ))dt 
-oo 

/CO 
Re(«BB(t)) hn(g(t - tl s )) dt, (16.29) 

-CO 

where the inner products can be computed again using a matched filter: 
Im(*BB(t)) Ma(t - ^ T s)) dt = (Iiii(sbb) *Re(g))(£T s ), 
Re(*BB(*)) Im( ff (* " n *)) dt = (Re(s BB ) *Im(g))(£T s ). 



— oo 

oo 



Things become simpler when the pulse shape g is real. In this case (16.26) and 
(16.29) simplify to 

(r,g he ) = V2 [Re(s BB (t))g(t-lT s )dt, g real, (16.30a) 

{r,g Q j) = V2 flm(s BB {t))g{t-n s )dt, greal. (16.30b) 

Diagrams demonstrating how these inner products are computed are given in Fig- 
ures 16.3 and 16.4. We have already discussed the first diagram, which includes the 
front-end bandpass filter and the circuit for producing Sbb- The second diagram 
includes the matched filtering needed to compute the RHS of (16.30a) and the 
RHS of (16.30b). Notice that we have accomplished our second objective in that 
the first circuit depends only on the carrier frequency f c (and the bandwidth W) 
and the second circuit depends on the pulse shape but not on the carrier frequency. 

16.8.2 Computing Inner Products Directly 

The astute reader may have noticed that neither the bandpass filtering of the 
signal r nor the image rejection filters that produce Sbb are needed for the com- 
putation of the inner products. Indeed, starting from (16.8a) 

<r,g M ) = (r,tH 2Re( gM , BB (t)e i2 " /c *)) 

= 2Re((r,t^ 9M , BB (i)e i2 " /c ')) 

= 2Re((^r(i)e- 12 ^,g IA BB)) 

= V2Re({t ^ r (t) e~ i27r/c *, t >-> g(t - £T S ))) , (16.31a) 
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Re(s B B) 



Iiii(sbb) 



(J, 



(J. 



k( r 'Si,e) 



v'2 



775 < r 'SQ,«) 



V2 



Figure 16.4: QAM demodulation: matched filtering (g real). 



where the second equality follows because r is real and the last equality from 
(16.9a). Similarly starting from (16.8b) 

(r, g Q)/ ) = (r,t^2 Re(g QABB (t) e i2 ^')) 
= 2Re((r,i^ 9QXB B(i)e i2T/c4 )) 



2Re((ii-> r(t)e 



-Uvfot 



gQ,£.BB)j 



V2Re((t ^ r{t) e^ 2 * *=* , t^\g{t- H s ))) 
\/2Im((t i-» r(t) e-'^te ,t h-» g(t - fC s ))), 



(16.31b) 



where the fourth equality follows from (16.9b). Notice that the RHS of (16.31a) 
and the RHS of (16.31b) do not involve any filtering. To see how to implement 
them with real operations we can write them more explicitly as: 

(r,g Ll ) = V2Re(J r(t) e" 12 ^' g*(t - £T S ) dt) , 

(r,g Q , e ) = V2Im(J r(t)e- i2 ^V(i-£T s )diV 
or even more explicitly in terms of real operations as: 

/CO 
r{t) cos{2irf c t)Re(g{t - £T S )) dt 

a/2 / r(t)sin(2TTf c t)Im(g(t-£T s ))dt, (16.32a) 



(r, g Qi< ) = -V2 r{t) cos(27r/ c i) Im(g(t - fT»)) dt 

J —OO 

/CO 
r(t) sin(27r f c t) Re(g(t - tT s )) dt. (16.32b) 



The two approaches we discussed for computing the inner products are, of course, 
mathematically equivalent. The former makes more engineering sense, because the 
bandpass filter typically guarantees that the energy in s is significantly smaller 
than in r, thus reducing the dynamic range required from the rest of the receiver. 

The latter approach is mathematically cleaner because it requires less mathemat- 
ical justification. One need not check that the various filters satisfy the required 
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integrability conditions. Moreover, this approach is more useful when r is not 
energy-limited and when this is compensated for by the fast decay of the pulse 
shape. (See, for example, the situation addressed by Proposition 3.4.4.) 

16.9 Exercises 

Exercise 16.1 (Nyquist's Criterion and Passband Signals). Corollary 11.3.4 provides con- 
ditions under which the time shifts of a signal by integer multiples of T s are orthonormal. 
Discuss how these conditions apply to real passband signals of bandwidth W around the 
carrier frequency f c . Specifically: 

(i) Plot the function 

£ 



/-> E \»(f+ 



T„ 



for the passband signal y of Figure 7.2. Pay attention to how the sum at positive 
frequencies is influenced by the signal's FT at negative frequencies. 

(ii) Show that there exists a passband signal </>(•) whose bandwidth W around the 
carrier frequency f c is 1/(2T S ) and whose time shifts by integer multiples of T s are 
orthonormal if, and only if, 4T S / C is an odd integer. Show that such a signal must 
satisfy (outside a set of frequencies of Lebesgue measure zero) 



|<K/)|=V^l{||/|-/c|<^}, f€ 



(iii) Let <f> be an energy-limited baseband signal of bandwidth W/2 whose FT is a 
symmetric function of frequency and whose time shifts by integer multiples of (2T S ) 
are orthonormal. Let the carrier frequency f c be larger than W/2 and satisfy 
that 4T S / C is an odd integer. Show that the (possibly complex) passband signal 
t t—> \f2 cos(2tv f c t) cj>(t) is of bandwidth W around the carrier f c , and its time shifts 
by integer multiples of T s are orthonormal. 

Exercise 16.2 (How General is QAM?). Under what conditions on A, / c , 4>, W, and T s 

can we view the signal 

/ n 

t h-* A Re I e'f^'+W ^ Cf sinc ( Wi _ n;) 

^ i=i 

as a QAM signal? 

Exercise 16.3 (M-PSK). Consider a QAM signal X PB of the form (16.6) with the pulse 
shape g: t h I{ — T s /2 < t < T s /2} and symbols (Ct) that are IID and uniformly dis- 
tributed over the set 

{e i2,/8 ,e 2i2 " /8 ,...,e 7B * /8 ,l}. 

(i) Plot a sample function of (A"pb(£), tGt). 

(ii) Are the sample paths continuous? 

(iii) Express Aps(t) in the form 2Acos(27r/ c t + $(£)) and describe $(£). Plot a sample 
path of ($(£)). 
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Exercise 16.4 (Transmission Rate, Encoder Rate, and Bandwidth). Data bits are to be 
transmitted at rate Rb bits per second using QAM with a pulse shape <p satisfying the 
orthonormality condition (16.11). 

(i) Let W be the allotted bandwidth around the carrier frequency. What is the minimal 
constellation size required for the data bits to be reliably communicated in the 
absence of noise? 

(ii) Repeat Part (i) if you are required to use a pulse shape of excess-bandwidth of 
ft — 15% or more. 

Exercise 16.5 (Synthesis of 16-QAM). Let Xi(-) and X 2 (-) be 4-QAM (QPSK) signals 
that are given for every t £ R by 

X v (t) = 2ARe(j2 C e V) 9(t - £T S ) e' 27,f A , u= 1,2, 
^ e=i ' 

where the symbols (C)" ) take on the values ±1 ± i. Show that for the right choice of the 
constant agl, the signal 

X(t) = aX 1 (t)+X 2 (t), t£l 

can be viewed as a 16-QAM signal with a square constellation. 

Exercise 16.6 (Orthogonality of the In-Phase and Quadrature Components). Let the 

pulse shape g be a real integrable signal that is bandlimited to W/2 Hz, and let the 
carrier frequency f c be larger than W/2. Show that, even if the time shifts of g by 
integer multiples of T s are not orthonormal, the signals 

1 1— > g(t — £T S ) cos(2nf c t + ip) and t \— » g(t — £'T S ) sin(2nf c t + ip) 

are orthogonal for all integers I , I' (not necessarily distinct). Here <p £ [— n, n) is arbitrary. 

Exercise 16.7 (The Importance of the Phase). Let x and y be real integrable signals 
that are bandlimited to W/2 Hz. Let the transmitted signal s be 

s(t) = Re((x(£) + \y(t)) e ^^ t+ ^ 

= x(t) cos(2tt f c t + 4> T ) - y(t) sin(27r f c t + 4>t), t e R, 

where f c > W/2, and where 4>t denotes the phase of the transmitted carrier. The receiver 
multiplies s(t) by 2 cos(2n f c t+ <f>n) (where R denotes the phase of the receiver's oscillator) 
and passes the resulting product through a lowpass filter of cutoff frequency W/2 to 
produce the signal x: 

x{t) = Ut i->s(r) 2 cos(27t/ c t + 0r)) *LPF w Vi), f£l. 

Express £(•) in terms of x(-), y(-), (/>t and 0r. Evaluate your expression in the following 
cases: 4> T — 0r, 4> t - </> R = ir, 4>t - 4>b. = n/2, and cj> T - </> R = n/4. 

Exercise 16.8 (Phase Imprecision). Consider QAM with a real pulse shape and a receiver 
that performs a conversion to baseband followed by matched filtering (Section 16.8.1). 
Write an expression for the output of the receiver if its oscillator is at the right frequency 
but lags the phase of the transmitter's oscillator by A(j). 
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Exercise 16.9 (Rotating a QAM Constellation). Show that rotating a QAM constellation 
changes neither its second moment nor its minimum distance. 

Exercise 16.10 (Optimal Rectangular Constellation). Consider all rectangular constella- 
tions of the form 

{a + ift, a — \b, —a + \b, —a — \b}, 

where a and b are real. Which of these constellations whose second moment is one has 
the largest minimum distance? 



Chapter 17 

Complex Random Variables and Processes 

17.1 Introduction 

We first encountered complex random variables in Chapter 16 on QAM. There we 
considered an encoder that maps fc-tuples of bits into n-tuples of complex numbers, 
and we then considered the result of applying this encoder to random bits. The 
resulting symbols were therefore random and were taking value in the complex 
field, i.e., they were complex random variables. Complex random variables are 
functions that map "luck" into the complex field: they map every outcome of the 
experiment u> £ SI to a complex number. Thus, they are very much like regular 
random variables, except that they take value in the complex field. They can 
always be considered as pairs of real variables: their real and imaginary parts. 

It is perfectly meaningful to discuss their expectation and variance. If C is a 
complex random variable, then 

E[C] = E[Re(C)]+iE[lm(C)], 

E[|q 2 ]=E[(Re(C)) 2 ]+E[(lm(C)) 2 ], 
and 

Var[C] = E[|C-E[C]| 2 ] 

= E[|C| 2 ]-|E[C]| 2 . 

In this chapter we shall make the above definition of complex random variables 
more formal and also discuss complex random vectors and complex stochastic pro- 
cesses. 

Complex random variables can be avoided if one treats such variables as pairs 
of real variables. However, we do not recommend this approach. Many of the 
complex variables and processes encountered in Digital Communications possess 
additional properties that simplify their manipulation, and complex variables are 
better suited to take advantage of these simplifications. 

We begin this chapter with some notation followed by some basic definitions for 
complex random variables. We next introduce a property that simplifies their 
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manipulation: properness. (Another such property, circular symmetry, is described 
in Chapter 24.) Finally, we extend the discussion to complex random vectors and 
conclude with a discussion of complex stochastic processes. 



17.2 Notation 



The notation we use in this chapter is fairly standard. The only issue that may 
need clarification is the difference between three matrix/vector operations: trans- 
position, conjugation, and Hermitian conjugation. These operations are described 
next. 

All vectors in this chapter are column vectors. Thus, a vector a whose components 
are a^ 1 ' , . . . , a' ■"' is the column vector 



f a \ 



(-') 



(17.1) 



\a< n 7 



We shall sometimes refer to such a vector a as an n- vector to make the number of 
its components explicit. For typesetting reasons, we shall usually use the notation 



(' 



,d) 



,M\ 



(17.2) 



which is more space efficient. Here the operator (-) T denotes the matrix trans- 
pose. Thus if we think of (a' 1 ', . . . a*-™-*) as a 1 x n matrix, then (a*- 1 ) , . . . a' n ') T is 
this matrix's transpose, i.e., annxl matrix, or a vector. More generally, if A is 
annxm matrix, then A is an m x n matrix whose Row- J Column-^ component 
is the Row-£ Column- j component of A. We say that A is symmetric if A T = A. 

We use (•)* to denote componentwise complex conjugation. Thus, if a is as 
in (17.1), then 



/(« (i) r\ 

(o<»>)* 
V(a<">)7 



(17.3) 



We use (-)t to denote Hermitian conjugation, i.e., the componentwise conjugate 
of the transposed matrix. Thus, if a is as in (17.1), then a^ is the lxn matrix 



,W) 



(a("T). 



(17.4) 



The Hermitian conjugate A^ of an n x m matrix A is an m x n matrix whose Row- J 
Column-^ component is the complex conjugate of the Row-^ Column-j component 
of the matrix A. We say that a matrix A is conjugate-symmetric or self-adjoint 
or Hermitian if A^ = A. 
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Note that if a and b are n- vectors, then a T b is a scalar 






whereas ab T is the n x n matrix 



ab 1 



(17.5) 



ffl (2) 6 (l) a (2) 6 (2) _ (2) 6 (n) 



\ a («) 6 (l) (n)fe(2) ... (n)6(")/ 



17.3 Complex Random Variables 

We say that C is a complex random variable (CRV) on the probability space 
(CI, T , P) if C : CI — > C is a mapping from f2 to the complex field C such that both 
Re(C) and Im(C) are random variables on (Q,.F, P). 

Any CRV Z can be written in the form Z = X + i Y, where X and Y are real 
random variables. But there are some advantages to studying complex random 
variables over pairs of real random variables. Those will become apparent when we 
discuss analytic functions of complex random variables and when we discuss com- 
plex random variables that have special properties such as that of being "proper" 
or that of being "circularly-symmetric." 

Many of the definitions related to complex random variables are similar to the 
analogous definitions for pairs of real random variables, but some are not. We 
shall try to emphasize the latter. 



17.3.1 Distribution and Density 

Since it makes no sense to say that one complex number is smaller than another, we 
cannot define the cumulative distribution function (CDF) of a CRV as in the real 
case: an expression like "Pr[Z < 1 + i]" is meaningless. We can, however, discuss 
the joint distribution function of the real and imaginary parts of a CRV, which 
specifies Pr[Re(Z) < x, Im(Z) < y] for all x, y G K. We say that two complex 
random variables W and Z are of equal law (or have the same distribution) and 
write W = Z, if the joint distribution of the pair (Re(VF),Im(M /r )) is identical to 
the joint distribution of the pair (R.e(Z),Im(Z)): 



W = Z\ & 
Pr[Re(W0 < x,lm(W) < y] = Pr[Re(Z) < x,lm(Z) <y], x,y £l). (17.6) 
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Similarly, we can define the density function fz(-) (if it exists) of a CRV Z at the 
point z € C as the joint density of the real pair (Re(Z),Im(Z)) at (Re(,z),Im(z)): 

fz(z) = /R e (z),im(z)(Re(^),Im(^)), zeC, (17.7) 

which can also be written as 



2 



h{z) = 9^ Pr[Re(Z) " X ' lm{Z) ~ y] 



, zeC. (17.8) 

x— Y{.c(z),y— Im(z) 



The notions of distribution function and density of a CRV extend immediately to 
pairs of complex variables and, more generally, to n-tuples. 



17.3.2 The Expectation 

The expectation of a CRV can be defined in terms of the expectations of its real 
and imaginary parts: 

E[Z] = E[Re(Z)] + iE[Im(Z)], (17.9) 

provided that the two real expectations E[Re(Z)] and E[Im(Z)] are finite. With 
this definition one can readily verify that, whenever E[Z] is defined, conjugation 
and expectation commute 

E[Z*] = (E[Z])*, (17.10) 

and 

Re(E[Z]) = E[Re(Z)], (17.11a) 

Im(E[Z]) = E[lm(Z)]. (17.11b) 

If the CRV Z has a density fz('), then the expectation E[p(Z)] for some measurable 
function g : C — > C can be formally written as 

E[g(Z)] = f f z (z)g(z)dz (17.12) 

Jzec 

or, in terms of real integrals, as 

/oo />oo 
/ fz(x + \y)Re(g(x + \y))dxdy 
-co J — OO 

/"CO /*oo 

+ i/ / fz{x+\y)Im(g{x+\y))dxdy. (17.13) 



OO 'J — oo 



Thus, rather than computing the distribution of g{Z) and of then computing the 
expectations of its real and imaginary parts, one can use (17.12). 



17.3 Complex Random Variables 287 

17.3.3 The Variance 

The definition of the variance of a CRV is not consistent with viewing the CRV as 
a pair of real random variables. The variance Var[Z] of a CRV Z is defined as 

Var[Z]^ E[|Z-E[Z]| 2 ] (17.14a) 

= E[|Z| 2 ]-|E[Z]| 2 (17.14b) 

= Var[Re(Z)] + Var[lm(Z)]. (17.14c) 

This definition should be contrasted with the definition of the covariance matrix 
of the pair (Re(Z),Im(Z)) 

Var[Re(Z)] Cov[Re(Z), Im(Z)] 

Cov[Re(Z),Im(Z)] Var[lm(Z)] 

One can compute the variance of Z from the covariance matrix of (Re(Z),Im(Z)), 
but not the other way around. Indeed, the variance of Z is just the trace of the 
covariance matrix of (Re(Z),Im(Z)). 

To derive (17.14b) from (17.14a) we note that 

E[|Z-E[Z]| 2 ]=E[(Z-E[Z])(Z-E[Z]r] 
= E[(Z-E[Z])(Z*-E[Z*])] 
= E[(Z-E[Z])Z*]-E[(Z-E[Z])]E[Z*] 
= E[(Z-E[Z])Z*] 
= E[ZZ*] - E[Z]E[Z*] 
= E[|Z| 2 ]-|E[Z]| 2 , 

where we only used the linearity of expectation and (17.10). Here the first equality 
follows by writing |w| 2 as ww*; the second by (17.10); the third by simple algebra; 
the fourth because the expectation of Z — E[Z] is zero; and the final by (17.10). 

To derive (17.14c) from (17.14b) we write E[|Z| 2 ] as E[(Re(Z)) 2 + (Im(Z)) 2 ] and 
express |E[Z]| 2 using (17.9) as E[Re(Z)] 2 + E[Im(Z)] 2 . 

17.3.4 Proper Complex Random Variables 

Many of the complex random variables that appear in Digital Communications 
are proper. This is a concept that has no natural counterpart for real random 
variables. 

Definition 17.3.1 (Proper CRV). We say that the CRV Z is proper if the following 
three conditions are all satisfied: it is of zero-mean; it is of finite-variance; and 

E[Z 2 ] =0. (17.15) 

Notice that the LHS of (17.15) is, in general, a complex number, so (17.15) is 
equivalent to two real equations: 

E[Re(Z) 2 ] = E[lm(Z) 2 ] (17.16a) 
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and 

E[Re(Z) Im(Z)] =0. (17.16b) 

This leads to the following characterization of proper complex random variables. 

Proposition 17.3.2. A CRV Z is proper if, and only if, all three of the following 
conditions are satisfied: Z is of zero mean; Re(Z) & Im{Z) have the same finite 
variance; and Re(Z) & Im(Z) are uncorrelated. 

An example of a proper CRV is one taking on the four values {±1, ±i} equiprobably. 

We mentioned earlier in Section 17.3.3 that the variance of a CRV is not the 
same as the covariance matrix of the tuple consisting of its real and imaginary 
parts. While the covariance matrix determines the variance, the variance does not 
uniquely determine the covariance matrix. However, if a CRV is proper, then its 
variance uniquely determines the covariance matrix of its real and imaginary parts. 
Indeed, by Proposition 17.3.2, a zero-mean finite- variance CRV is proper if, and 
only if, the covariance matrix of the pair (Re(Z), Im(Z)) is given by 

|Var[Z] 

±Var[Z] 

17.3.5 The Covariance 

The covariance Cov[Z, W] between the complex random variables Z and W is 
defined by 

Cov[Z,W] = e\(Z - E[Z])(W- E[W])*1 . (17.17) 

Again, this definition is different from the one for pairs of real random variables: 
the covariance between two pairs of real random variables is a real matrix, whereas 
the covariance between two CRVs is a complex scalar. 

Some of the key properties of the covariance are listed next. They hold whenever 
the a's and /3's are deterministic complex numbers and the covariances on the RHS 
are defined. 

(i) Conjugate Symmetry: 

Cov[Z,W] = (Cov[W, Z])* . (17.18) 

(ii) Sesquilinearity: 

Cov{aZ, W] = aCov[Z, W] , (17.19) 

Cov[Z! + Z 2 , W] = Cov[Z u W] + Cov[Z 2 , W] , (17.20) 

Cov[Z, /3W] = /?*Cov[Z, W] , (17.21) 

Cov[Z, Wi + W 2 ] = Cov[Z, Wi] + Cov[Z, W 2 ] , (17.22) 

and, more generally, 



Cov 



i=i j'=i 



E a i Z h E Pi' w r = E E ajfyCcvlZ^Wj,] . (17.23) 



i=ij'=i 
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(iii) Relation with Variance: 

Var[Z] = Cov[Z, Z] . (17.24) 

(iv) Variance of Linear Functionals: 



Var 



i=i 



E a ^ =EE Q ^'H^,%]- (17.25) 



i=i i'=i 



17.3.6 The Characteristic Function 

The definition of the characteristic function of a CRV is consistent with viewing it as 
a pair of real random variables. Recall that the characteristic function &x '■ R — > C 
of a real random variable X is defined by 

$x : w i-> E [e mX ] , weR. (17.26) 

For a pair of real random variables X, Y the joint characteristic function is the 
mapping &x,Y '■ K 2 — > C defined by 

$x,y : (wi,ro 2 ) •-> E ^1*+^)] ; roi) W2 e r. (17.27) 

Note that the expectations in (17.26) and (17.27) are always defined, because the 
argument to the expectation operator is of modulus one (| e lr | = 1, whenever r is 
real). This motivates us to define the characteristic function for a complex random 
variable as follows. 

Definition 17.3.3 (Characteristic Function of a CRV). The characteristic func- 
tion <J>z : C — > C of a complex random variable Z is defined as 

$z(tu)^E[e iRo(ro * z) ] , weC 

,i(Ro( ro )Ro(Z)+Im( ro )Im(Z))j ^ m e £. 

Here we can think of Re(tzj) and Im(tu) as playing the role of w\ and 1372 in (17.27). 



17.3.7 Transforming Complex Variables 

We next calculate the density of the result of applying a (deterministic) transfor- 
mation to a CRV. The key to the calculation is to treat the CRV as a pair of real 
random variables and to then apply the analogous result regarding the transfor- 
mation of a random real tuple. To that end we recall the following basic theorem 
regarding the transformation of real random vectors. In the theorem's statement 
we encounter the notion of an open subset of K". Loosely speaking, T> C K" is an 
open subset of K n if to each x £ T> there corresponds some e > such that the 
ball of radius e and center x is fully contained in P. 1 



1 Thus, T> is an open subset of K n if P C R n and if to each x£P there corresponds some 
e > such that each y £ IR n satisfying (x — y) T (x — y) < e 2 is in T>. 
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Theorem 17.3.4 (Transforming Real Random Vectors). Letg: V — » 1Z be a one- 
to-one mapping from an open subset T> of K™ onto a subset TZ of K™ . Assume 
that g has continuous partial derivatives in T> and that the Jacobian determinant 
det (<9g(x)/<9x) is at no point of T> zero. Let the real random n-vector X have 
the density function /x(-) and satisfy Pr[X 6 D] = 1. Then the random n-vector 
Y = <?(X) is of density 



My) 



/x(x) 



det 



99(x) 



<9x 



I{y G ft}. 



(17.28) 



=ff-My) 



Using Theorem 17.3.4 we can relate the density of a CRV Z and the joint distri- 
bution of its phase and magnitude. 

Lemma 17.3.5 (The Joint Density of the Magnitude and Phase of a CRV). Let Z 

be a CRV of density fz('), and let R = \Z\ and G [— 7r,7r) be the magnitude and 
argument of Z: 

Z = Re ie , Z> 0, 06 [-7r,7r). 

Then the joint distribution of the pair (R, 0) is of density 

f R Mr,O) = rfz(re< ), r > 0, 6 € [-tt.tt). (17.29) 



Proof. This result follows directly from Theorem 17.3.4 by computing the absolute 
value of the Jacobian determinant of the transformation 2 (x, y) i— > (r, 6) where 
r = \/x 2 + y 2 and 6 = ta.n~ 1 (y/x): 




1 
r 



□ 



For the next change-of- variables result we recall some basic concepts from Complex 
Analysis. Given some Zq G C and some nonnegative real number r > 0, we denote 
by T>(zo,r) the disc of radius r that is centered at zq: 

%,r) = {^C: \z-z \ <r}. 

We say that a subset T> of the complex plane is open if to each z G T> there 
corresponds some e > such that T>(zo,e) C T>. Let g: T> — > C be some function 
from an open set P C C to C. Let z be in T>. We say that g(-) is difFerentiable 
at zq G T> and that its derivative at z is the complex number g'(zo), if for every 
e > there exists some 5 > such that 



g(z + /i) -5(2:0) 



VW 



<e, 



(17.30) 



2 Here T> is the set IR 2 without the origin. 
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whenever the complex number h £ C satisfies < \h\ < S. It is important to note 
that here h is complex. If g is differentiable at every zgD, then we say that g is 
holomorphic or analytic in D. 3 

Define the mappings 

u,v: {x,y eR:x+\y eV} ^R (17.31a) 

by 

u(x,y)=Re(g(x + \y)), (17.31b) 

and 

v(x, y) = Im(g(a; + \y)). (17.31c) 

Proposition 17.3.6 (The Cauchy-Riemann Equations). Let V C C be open and 
let g: T> — > C be analytic in T>. Let u,v be defined by (17.31). Then u and v 
satisfy the Cauchy-Riemann equations 



du(x,y) dv(x,y) 
dx dy 

du{x,y) dv(x,y) 

dy dx 

at every i,ji6M such that x + \y € T> , and 



,' du(x,y) . dv{x,y) 
dx dx 



(x,3/)=(Re(2),Im(2)) 



(17.32a) 
(17.32b) 

zeV. (17.33) 



Moreover, the partial derivatives in (17.32) are continuous in the subset of M. 2 
defined by {x,y £ M : x + \y € V} . 

Proof. See (Rudin, 1974, Chapter 11, Theorem 11.2 & Theorem 11.4) or (Nehari, 
1975, Chapter II, Section 5 & Chapter III, Section 3). □ 

We can now state the change-of-variables theorem for CRVs. 

Theorem 17.3.7 (Transforming Complex Random Variables). Let g: V — > 1Z be 

a one-to-one mapping from an open subset T> of C onto a subset 1Z of C. Assume 
that g is analytic in V and that at no point of V is the derivative of g zero. Let 
the CRV have the density function fz{) and satisfy Pr[Z £ V] = 1. Then the CRV 
defined by W = g{Z) is of density 



fw(w) 



l{w e K}. (17.34) 

z=g~ 1 (w) 



Here g~ 1 (w) denotes the point in T> that is mapped by g to w 



3 There is some confusion in the literature about the terms analytic, holomorphic, and 
regular. We are following here (Rudin, 1974). 
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Note 17.3.8. The square in (17.34) does not appear in dealing with real random 
variables. It appears here because a mapping of complex numbers is essentially 
two-dimensional: scaling by a G C translates to a scaling of area by \a\ 2 . 

Proof. To prove (17.34) we begin by expressing the function <?(•) as 

g (x + \y) = u(x, y) + \v(x, y), ( x, y € R, x + \y € Vj , 

where u(x,y) = Re(g(x + \y)) and v(x, y) = Im(g(a; + \y)) are defined in (17.31b) 
and (17.31c). The density of g(Z) is, by definition, the joint density of the pair 
u(Re(Z),Im(Z)),v(Re(Z),Im(Z)). And the joint density of the pair (Re(Z), Im(Z)) 
is just the density of Z. Thus, if we could relate the joint density of the pair 
u(Re(Z),Im(Z)), v(Re(Z), Im(Z)) to the joint density of the pair (Re(Z),Im(Z)), 
then we could relate the density of g(Z) to the density of Z. 

To relate the joint density of the pair u(Re(Z),Im(Z)), v(Re(Z), Im(Z)) to the 
joint density of the pair (Re(Z),Im(Z)) we employ Theorem 17.3.4. To that end 
we need to compute the absolute value of the Jacobian determinant. This we do 

as follows: 




det 

du 

Ox 

g'(x 



Oa 



dx 

2 



w) 



_ dv 
dx 

OIL 

dx 

dv 

dx 



(17.35) 



where the first equality follows from the Cauchy-Riemann equations (17.32); the 
second from a direct calculation of the determinant of a 2 x 2 matrix; and where 
the last equality follows from (17.33). The theorem now follows from (17.35) and 
Theorem 17.3.4. □ 



17.4 Complex Random Vectors 

We say that Z = (Z^ l \ . . . , Z^ n ') T is a complex random vector on the probability 
space (57, T , P) if it is a mapping from the outcome set i7 to C™ such that the real 
vector 

(Re(Z^),lm(Z^),...,Re(Z^),lm(Z^y" ' 

comprising the real and imaginary parts of its components is a real random vector 
on (57, JF, P), i.e., if each of the components of Z is a CRV. 

We say that the complex random vector Z = {Z^ l \ . . . 1 Z^ n ') T and the complex 
random vector W = (W^ 1 ' , . . . , W^ n ') T are of equal law (or have the same distri- 
bution) and write Z = W, if the real vector taking value in R 2n whose components 
are the real and imaginary parts of the components of Z has the same distribution 
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as the analogous vector for W, i.e., if for all Xi, . . . , x n , j/i, . . . ,y n g R 
Pr[Re(zW) < x 1: Im(Z«) < y 1} . . . ,Re(zW) < ;c n ,Im(Z< n >) < y„] 

= Pr[Re(W^ (1) ) <cci,Im(W (1 )) < yi, . . . ,Re(W {n) ) < x n ,Im(W (n) ) <y n ~\. 

The expectation of a complex random vector is the vector consisting of the ex- 
pectation of each of its components. We say that a complex random vector is of 
finite variance if each of its components is a CRV of finite variance. 

17.4.1 The Covariance Matrix 

The discussion in Section 17.3.5 can be generalized to random complex vectors. 
The covariance matrix Kzz of a finite-variance complex random n-vector Z is 
defined as the conjugate-symmetric n x n matrix 

Kzz = E[(Z-E[Z])(Z-E[Z])t]. (17.36) 

Once again, this definition is not consistent with viewing the random complex 
vector as a vector of length 2n of real random variables. The latter would have a 
real symmetric 2n x 2n covariance matrix. 

The reader may wonder why we have chosen to define the covariance and the covari- 
ance matrix with the conjugation sign. Why not look at E [(Z — E[Z])(Z — E[Z]) T ] ? 
The reason is that (17.36) is simply much more useful in applications. For example, 
for any deterministic c*i, . . . ,a„ £ C the variance of X^-=i a j^j can be computed 
from K ZZ (using (17.25)) but not from E[(Z - E[Z])(Z - E[Z]) T ]. 

17.4.2 Proper Complex Random Vectors 

The notion of proper random variables extends to vectors: 

Definition 17.4.1 (Proper Complex Random Vector). A complex random vector Z 
is said to be proper if the following three conditions are all met: it is of zero mean; 
it is of finite variance; and 

E[ZZ T ] =0. (17.37) 

An alternative definition can be given based on linear functionals: 

Proposition 17.4.2. The complex random n-vector Z is proper if, and only if, for 
every deterministic vector a € C n the CRV a T Z is proper. 

Proof. We begin by noting that Z is of zero mean if, and only if, a T Z is of zero 
mean for all a € C™. This can be seen from the relation 

E [a T Z] = a T E[Z] , a e C™. (17.38) 

Indeed, (17.38) demonstrates that if Z is of zero mean then so is a T Z for every 
a € C". Conversely, if a T Z is of zero mean for all a. € C", then, a fortiori, it must 
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also be of zero mean for the choice of a = E[Z] , which yields that = E[Z] E[Z] 
and hence that E[Z] must be zero (because E[Z] E[Z] is the sum of the squared 
magnitudes of the components of E[Z]). 

We next note that Z is of finite variance if, and only if, a T Z is of finite variance 
for every ex G C™. The proof is not difficult and is omitted. 

We thus continue with the proof under the assumption that Z is of zero mean and 
of finite variance. We note that for any deterministic complex vector ex G C™ 

E[(a T Z) 2 ]=E[(a T Z)(a T Z)] 
= E[(a T Z)(« T Z) T ] 
= E[a T ZZ T a] 
= a T E [ZZ T ] ex, ex G C™, (17.39) 

where the first equality follows by writing the square of a random variable as the 
product of the variable by itself; the second because the transpose of a scalar is 
the original scalar; the third by the transpose rule 

(AB) T = B T A T , (17.40) 

and the final equality because ex is deterministic. 

From (17.39) it follows that if Z is proper, then so is ex T Z, for all ex G C™. Actually, 
(17.39) also proves the reverse implication by substituting A = E[ZZ T 1 in the 
following fact from Matrix Theory: 

a T Ad = ! aeC")=>(A = 0) ! Asymmetric. (17.41) 



To prove this fact from Matrix Theory assume that A is symmetric, i.e., that 

a (j,*) = a m t j,£e{l,...,n}. (17.42) 

Let ex = ei where e^ is all-zero except for its ^-th component, which is one. The 
equality ejAe^ = for every £ G {1, . . . , n} is equivalent to 

o (M) = 0, £e{l,...,n}. (17.43) 

Next choose ex = ej + eg. The equality 

(e-j + e e ) T A{ ej + e e ) = 
for every j, £ G {1, . . . , n} is then equivalent to 

a U,t) + a U>J) + a (.t,i) + a (W = 0) jj e {l,..., n }. (17.44) 

Equations (17.42), (17.43), and (17.44) guarantee that the matrix A is all-zero. □ 

An important observation regarding complex random vectors is that a linearly- 
transformed proper vector is also proper: 
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Proposition 17.4.3 (Linear Transformation of a Proper Random Vector). If the 

complex random n-vector Z is proper, then so is the complex random m-vector AZ 
for every deterministic m x n complex matrix A. 

Proof. We leave it to the reader to verify that the hypothesis that Z is proper 
implies that AZ must be of zero mean and of finite variance. To show that AZ 
is proper, it thus remains to show that E[(AZ)(AZ) T ] = 0. This we do by direct 
calculation: 

E[(AZ)(AZ) T ] = E[AZZ T A T ] 
= AE[ZZ T ] A T 
= 0, 

where the first equality follows from the rule for the transpose of a product, namely, 
(AB) T = B T A T ; the second because A is deterministic; and the last from the 
hypothesis that Z is proper, so E[ZZ T ] =0. □ 



17.4.3 The Characteristic Function 

The definition we gave in Section 17.3.6 for the characteristic function of a CRV 
extends naturally to vectors: the characteristic function <J>z : C n — » C of a complex 
random n-vector Z is defined as 

$zM = E[e iRo(rotz) ] , weC". 

Invoking the analogous result for tuples of real random variables we have: 

Theorem 17.4.4. The complex random vectors Z and W are of equal law if, and 
only if, their characteristic functions are identical: 

Z = w) O ($ z (w) = $wM, ra e C") . (17.45) 

Corollary 17.4.5. The complex random n-vectors Z and~W are of equal law if, and 
only if, for every deterministic vector a € C™ the complex random variables cc T Z 
and a T W are of equal law: 

= w)»(a T Zia T W, aeCJ. (17.46) 

Proof. The direction that needs proof is that equality in law of all linear combi- 
nations implies equality in law between the vectors. But this readily follows from 
the theorem because equality in law of the linear combinations implies that the 
law of vj^Ti is equal to the law of -c^W for every vj € C™. This in turn implies 
giRefro z) _ e iRo(ro w) ^ f rom w hich, upon taking expectations, we obtain that Z 
and W have identical characteristic functions. Thus, by the theorem, they are 
equal in law. □ 
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17.4.4 Transforming Complex Random Vectors 

The change of density rule (17.34) can be generalized to analytic multi- variable 
mappings (Exercise 17.6). But here we shall only present a version of this result 
for linear mappings: 

Lemma 17.4.6 (Linearly Transforming Complex Random Vectors). Let the com- 
plex random n-vector W be given by 

W = AZ, 

where A is a nonsingular deterministic complex nxn matrix, and where the complex 
random n-vector Z has the density /z(-)- Then W is of density 

/W(W) = ^ieTAf /Z(A ^ W) ' WeC "' (17 - 47) 

Proof. The proof is based on viewing the complex nxn linear transformation 
from Z to W as a 2nx2n real transformation, and on then applying Theorem 17.3.4. 

Stack the real parts of the components of Z on top of the imaginary parts in a real 
random In- vector S: 

S= (Re(z( 1 )),...,Re(z(")),Im(z( 1 )),...,Im(Z^)) T . (17.48) 

Similarly, stack the real parts of the components of W on top of the imaginary 
parts in a real random 2n-vector T: 

Re(W (1) ),. . . , Re(W< n >), lm(W w ) , . . . ,lm(W^ 

We can then express T as the result of multiplying the random vector S by a 

In x In real matrix: 

/ Re (A) -Im(A)\ 

~ \Jm(A) Re(A) ) ' 
where Re(A) and Im(A) denote the componentwise real and imaginary parts of A. 

The result will follow from Theorem 17.3.4 once we show that the absolute value 
of the Jacobian determinant of this transformation is |det A| 2 . Using elementary 
row and column operations we compute: 

det (?*<*> - ^if) = det ( * ~^) 
\Im(A) Re(A) / \-iA Re(A) J 

-* (J -■£*>) 

= (detA)(detA*) 

= |detA| 2 , 

where the first equality follows by the elementary column operations of multiplying 
the right columns by ( — i) and adding the result to the left columns; the second 
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from the elementary row operations of multiplying the top rows by i and adding 
the result to the bottom rows; the third from the identity 

det (o d) = ( detB )( detD ); 

and the last by noting that for any square matrix B 

det(B*) = (detB)*. □ 

17.5 Discrete-Time Complex Stochastic Processes 

Definition 12.2.1 of a real stochastic process extends to the complex case as follows. 

Definition 17.5.1 (Complex Stochastic Process). A complex stochastic pro- 
cess (CSP) \Z(t), t G T) is a collection of complex random variables that are 
defined on a common probability space (Q,,T,P) and that are indexed by some 
set T. 

A CSP (Z{t), t G T) is said to be centered if for each t G T the CRV Z(t) is of 
zero mean. Similarly, the CSP is said to be of finite variance if for each t G T the 
CRV Z(t) is of finite variance. A discrete-time CSP corresponds to the case where 
the index set T is the set of integers Z. Discrete-time complex stochastic processes 
are not very different from the real- valued ones we encountered in Chapter 13. 
Consequently we shall present the main definitions and results succinctly with 
an emphasis on the issues where the complex and real processes differ. As in 
Chapter 13, when dealing with a discrete-time CSP we shall use subscripts to 
index the complex random variables and denote the process by \Z V , i/GZ or, 
more succinctly, by (Z^J . 

A discrete-time CSP (Z v , v G Z) is said to be stationary, or strict-sense sta- 
tionary, or strongly stationary if for every positive integer n and for every 
77,77' G Z, the joint distribution of the n-tuple (Z n , . . . Z, q+n -i) is identical to the 
joint distribution of the n-tuple (Z v >, . . . , Z n > +n -i). This definition is essentially 
identical to the analogous definition for real processes (Definition 13.2.1). Similarly, 
Proposition 13.2.2 holds verbatim also for complex stochastic processes. Proposi- 
tion 13.2.3 also holds for complex stochastic processes with the slight modification 
that the deterministic coefficients a±, . . . ,a n are now allowed to be arbitrary com- 
plex numbers: 

Proposition 17.5.2. A discrete-time CSP \Z U ) is stationary if, and only if, for 
every n G N, all r), v\, . . . , v n G Z, and all d\, . . . , a n G C, 



a i Z "i ~ ^ a J Z »i+n- (17.49) 



E 



The definition of a wide-sense stationary CSP is very similar to the analogous 
definition for real processes (Definition 13.3.1). 
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Definition 17.5.3 (Wide-Sense Stationary Discrete-Time CSP). We say that 
a discrete-time CSP \Z V ) is wide-sense stationary or weakly stationary or 
covariance stationary if the following three conditions all hold: 

1) For every i/GZ the CRV Z v is of finite variance. 

2) The mean of Z v does not depend on v . 

3) The expectation E[Z U Z*,] depends on v' and v only via their difference v — v': 

e\z v z*A = E[z„ +v z;, +n } , vy, v e z. (17.50) 

Note the conjugation in (17.50). We do not require that E[Z v iZ u ] be computable 
from v — z/; it may or may not be. Thus, we do not require that the matrix 

E[Re{Z v ,)Re{Z v )] E[Re(Z v ,)\m(Z v )\ 
E[Im(Z v ,)Re(Z u )} E[Im(Z v> )Im(Z v )} 

be computable from v — v' . This matrix is, however, computable from v — v' if the 
process is proper: 

Definition 17.5.4 (Proper CSP). A discrete-time CSP (Z„) is said to be proper 
if the following three conditions all hold: it is centered; it is of finite variance; and 

E[Z v Z v ,]=0, z/,i/eZ. (17.51) 

Equivalently, a discrete-time CSP (Zj\ is proper if, and only if, for every positive 
integer n and all V\,...,v n G Z the complex random vector (Z Vl , . . . , Z Vn ) T is 
proper. Equivalently, [Z^S is proper if, and only if, for every positive integer n, all 
cui, . . . , a n € C, and all i/i, . . . , v n g Z 

n 

2^oijZ Vj is proper (17.52) 

(Proposition 17.4.2). 

The alternative definition of WSS real processes in terms of the variance of linear 
functionals of the process (Proposition 13.3.3) requires little change: 

Proposition 17.5.5. A finite-variance discrete-time CSP (Z u ) is WSS if and only 
if for every n£N, all n, V\, . . . , v n g Z, and all a 1: . . . ,a n € C 

n n 

\ UjZ v . and > a,jZ v . +r} have the same mean & variance. (17.53) 

Proof. We begin by assuming that (Z u ) is WSS and prove (17.53). The equality 
of expectations follows directly from the linearity of expectation and from the fact 
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that because (Z„) is WSS the mean of Z v does not depend on v. In proving the 
equality of the variances we use (17.25): 



Var 



n n 

= EE a i4 Cov [ Z »j- Z 7] 



Var 






where the second equality follows from the wide-sense stationarity of (Z,,) and the 
last equality again from (17.25). 

We next turn to proving that (17.53) implies that (Z„) is WSS. Choosing n = 1 and 
a.\ = 1 we obtain, by considering the equality of the means, that E[Z„] = E[Zj, + J 
for all r\ € Z, i.e., that the mean of the process is constant. And, by considering 
the equality of the variances, we obtain that the random variables (Z„) all have 
the same variance 

Var[Z„] = Var[Z„+„] , ^eZ. (17.54) 

Choosing n = 2 and ai = «2 = 1 we obtain from the equality of the variances 

Var[Z^ + Z„ 2 ] = Var[Z^+„ + Z, 2+ „] . (17.55) 

But, by (17.25) and (17.54), 

Var[Z Vl + Z„ 2 ] = 2Var[Zi] + 2Re(Cov[Z 1/1) Z„ 2 ]) (17.56) 

and similarly 

Var[Z„ 1+I , + Z U2+n ] = 2Va r [Zx] + 2Re(Cov[Z„ 1+ „, Z V2+r} }). (17.57) 

By (17.55), (17.56), and (17.57) 

Re(Cov[Z I/1+ ^,Z I/2+r) ]) =Re(Cov[Z^ 1 ,Z I/2 ]), r),vi,v 2 £l. (17.58) 

We now repeat the argument with a\ = 1 and a 2 = i: 

Var[Z^ + i Z„ 2 ] = VarfZ^J + Var[Z„ 2 ] + 2Re(Cov[Z I/1 , i Z„ 2 ]) 
= 2Var[Z!] + 2Im(Cov[Z„ 1 , Z„ 2 ]) 

and similarly 

Var[Z^+„ + i Z V2+ri ] = 2Var[Z!] + 2 Im(Cov[Z 1/1+ „ Z„ 2+r; ]) , 
so the equality of the variances implies 

Im(Cov[Z !yi+I ,,Zi, 2+ ^]) = Im(Cov[Zi, 1 ,Z 1/2 ]), r),vi,v 2 € Z, 
which combines with (17.58) to prove Cov[Z„ 1+r; , Z U2+11 ] = Co\/[Z Vll Z U2 ]. □ 
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As with real processes, a comparison of Propositions 17.5.5 and 17.5.2 yields that 
any finite-variance stationary CSP is also WSS. The reverse is not true. 

Definition 17.5.6 (Autocovariance Function). We define the autocovariance func- 
tion Kzz ■ 2 — > C of a discrete-time WSS CSP (Z v ) as 4 

K zz { V )±Cov[Z v+T ,,Z v ] (17.59) 

= e[(Z v+i , - E[Zi]) (Z„ - E[Zi])*] , neZ. 

By mimicking the derivations of (13.12) (taking into account the conjugate symme- 
try (17.18)) we obtain that the autocovariance function Kzz of every discrete-time 
WSS CSP (ZjJ satisfies the conjugate-symmetry condition 

Kzz(-V) = K*zz(v) , veZ. (17.60) 

Similarly, by mimicking the derivation of (13.13) (i.e., from the nonnegativity of 
the variance and from (17.25)), we obtain that the autocovariance function of such 
a process satisfies 

n n 

y^ y^ a v a* v , Kzz{v - v') > 0, ai,...,a„eC. (17.61) 

In analogy to the real case, (17.60) and (17.61) fully characterize the possible 
autocovariance functions in the sense that any function K : Z — > C satisfying 

K(-n) = K*(r 1 ), t|£Z (17.62) 

and 

n n 

2^ 2^ a v a*iK(v — v) > 0, ai,...,a„€C (17.63) 



is the autocovariance function of some discrete-time WSS CSP. If K : Z — » C 
satisfies (17.62) and (17.63), then we say that K(-) is a positive definite function 
from the integers to the complex field. 

Definition 13.16 of the power spectral density Szz requires no change. We 
require that Szz be integrable on the interval [—1/2, 1/2) and that 

,1/2 

Kzz(r/)=/ S ZZ (8)e-' 2 ^ e dO, neZ. (17.64) 

J -1/2 

Proposition 13.6.3 does require some alteration. Indeed, for complex stochastic 
processes the PSD need not be a symmetric function. However, the main result 
(that the PSD is real and nonnegative) remains true: 



4 Some authors, e.g., (Grimmett and Stirzaker, 2001), define Kzz(m) as Cov[Z v , Z v + m \. Our 
definition follows (Doob, 1990). 

5 In fact, it is the autocovariance function of some proper Gaussian stochastic process. Com- 
plex Gaussian random processes will be discussed in Chapter 24. 
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Proposition 17.5.7 (PSDs of Complex Processes Are Nonnegative). 

(i) If the discrete-time WSS CSP \Z V ) is of PSD Szz, then 

Szz{0) > 0, (17.65) 

except possibly on a subset of the interval [—1/2,1/2) of Lebesgue measure 
zero. 

(ii) If a function S: [—1/2,1/2) — > M. is integrable and nonnegative, then there 
exists a proper discrete-time WSS CSP 6 \Z„) whose PSD Szz is given by 

Szz(0) = S(9), 9 & [-1/2,1/2). 

As in the real case, by possibly changing the value of Szz on the set of Lebesgue 
measure zero where (17.65) is violated, we can obtain a power spectral density that 
is nonnegative for all 9 € [—1/2, 1/2). Consequently, we shall always assume that 
the PSD, if it exists, is nonnegative for all 9 s [—1/2, 1/2). 

Proof. We begin with Part (i) where we need to prove the nonnegativity of the 
PSD. We shall only sketch the proof. We recommend reading the appendix through 
Theorem A. 2. 2 before reading this proof. 

Let V^zz denote the autocovariance function of the WSS CSP (Z v ) . Applying 
(17.61) with 

a„ = e iW , uG {!,..., n} 



and thus 
we obtain 



n n 

< ^ ^ a v al, K Z z{v - v') 

v=\v' = \ 
n n 

= Y J Y J ^ {v - v ' )B ^zz{y-v') 



n-l 



]T (n-\ V \)e'^ e Kzz(v), 9 & [-1/2,1/2) 

rc=-(n-l) 



Dividing by n we obtain 

n-l 



°^ E (l-^je'^Kzziv) 

n=-(n-l) V H ' 

= E ( l ~ -) ^ ne Szzi-n) 

v =-(n-i) V n ' 
= (K-i*Szz)(0), 9 e [-1/2,1/2), 



°The process can be taken to be Gaussian; see Chapter 24. 
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where in the equality on the second line Szz(v) denotes the r/-th Fourier Series 
Coefficient of Szz and we use (17.64); and in the subsequent equality on the third 
line k„ denotes the degree-n Fejer kernel (Definition A. 1.3). 

We have thus established that k n _i * Szz is nonnegative. The result now follows 
from Theorem A. 2. 2 which guarantees that 

.1/2 

|d0 = O. 

'-1/2 1 



,1/2 

lim / \Szz(0)-(k n *S Z z)(0)\ 

n ~*°° J -1/2 



The proof of Part (ii) is very similar to the proof of the analogous result for real 
processes. As in (13.21), we define 

,1/2 

\<.{rf)= S(0) e-' a ^ e d.0, ryeZ, (17.66) 

J-l/2 

and we prove that this function satisfies (17.62) and (17.63). To prove (17.62) we 
compute 

,1/2 

K{-rt)= / S(0) e-' a <-^ B d0 

J -1/2 
.1/2 

S*(0)e i27r " e d0 

-1/2 
»l/2 

S(0) e" i27r " e d0 

-1/2 

= K*(t7), v ez, 

where the first equality follows from the definition of K(-) (17.66); the second 
because S(-) is, by assumption, real; the third because conjugating the integrand 
is equivalent to conjugating the integral; and the final equality again by (17.66). 

To prove (17.63) we mimic the derivation of (13.22) with the constants ai, . . . ,a n 
now being complex: 

.1/2 

i e -' 2 *("-" ) e d0 



e -i2v(v-v')9 \ dQ 



E E a »<' K ^ -^') = EE a »<' / s ^ 

v=lv' = l i/=ly' = l J -1/2 

,1/2 / n " 

= / s(0) EE a -< 

/1/2 / n n 

-V2 V „=!„/ = ! 

.1/2 / n \ / n 

= / s(0)(E«^- i2 -M(E«, 

^-1/2 \„=i 'V=i 

/1/2 n 

S(0) Va,e- i2 ^ 
-i/2 rrl 



a, e^ 2 ™ 9 of, e i2 ™' e ) d0 



e -\2W6 , d0 



1/2 

>o. a 
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Proposition 13.6.6 needs very little alteration. We only need to drop the symmetry 
property: 

Proposition 17.5.8 (PSD when Kzz Is Absolutely Summable). If the autocovari- 
ance function Kzz of a discrete-time WSS CSP is absolutely summable, i.e., 

oo 

Y, |Kzz(»?)|<oo, (17.67) 

7/ — — OO 

then the function 

DC 

S(0)= ]T K zz ( V )e' 2 ^ 9 , 6 [-1/2,1/2] (17.68) 

ri— — oo 

is continuous, nonnegative, and satisfies 

-1/2 

S(0) e-' 2 ^ 9 dd = K zz (v), f/eZ- (17.69) 

-1/2 

The Spectral Distribution Function that we encountered in Section 13.7 has a 
natural extension to discrete-time WSS CSPs: 

Theorem 17.5.9. 

(i) If \Z V ) is a WSS CSP of autocovariance function Kzz, then 

Kzz(r/) = K zz (0)E[e i2 ^ e ], ry e Z, (17.70) 

for some random variable taking value in the interval [—1/2,1/2). In 
the nontrivial case where Kzz(0) > the distribution function of & is fully 
specified by Kzz ■ 

(ii) If is any random variable taking value in [—1/2,1/2) and if a > 0, then 
there exists a proper discrete-time WSS CSP \Zv) whose autocovariance func- 
tion Kzz is given by 

K zz (v) = aE[e' 2 ^ e }, V e Z (17.71) 

and whose variance is consequently given by Kzz{fy = ct. 

Proof. See (Shiryaev, 1996, Chapter VI, Section § 1 Theorem 3), (Doob, 1990, 
Chapter X § 3 Theorem 3.2), or (Feller, 1971, Chapter XIX, Section 6, Theorem 3). 

□ 

Some authors refer to the mapping 8 i— > Pr[0 < 8] as the spectral distribution func- 
tion of (Z v ), but others refer to 6 t— > K^z(0) Pr[0 < 8] as the spectral distribution 
function. The latter is more common. 
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17.6 On the Eigenvalues of Large Toeplitz Matrices 

Although it will not be used in this book, we cannot resist stating the following 
classic result, which is sometimes called "Szego's Theorem." Let the function 
s: [—1/2, 1/2] — > [0, 00 ) be Lebesgue integrable. Define 

,1/2 

c v = s{6) e-' 2 ^ 9 dO, r/GZ. (17.72) 

J -1/2 

(In some applications s(-) is the PSD of a discrete-time real or complex stochastic 
process and c v is the value of the corresponding autocovariance function at Tj.) 

The n x n matrix 

/ c C\ ... c„_i\ 
C-i c ... c„_ 2 



\C-n+l Co / 

is positive semidefinite and conjugate-symmetric. Consequently, is has n nonneg- 
ative eigenvalues (counting multiplicity), which we denote by 

AW<Al 2 )<...<Ai"). (17.73) 

As n increases (with s(-) fixed), the number of eigenvalues increases. It turns out 
that we can say something quite precise about the distribution of these eigenvalues. 

Theorem 17.6.1. Let s: [—1/2,1/2] — > [0,oo) be integrable, and let A„ be as in 
(17.73). Letg: [0, oo) — > E be a continuous function such that the limit lim^^ ^W 
exists and is finite. Then 

lim T S (A«)= [' g(s(9))d6. (17.74) 



n^oon^ J_ 1/2 



Proof. For a proof of a more general statement of this theorem see (Simon, 2005, 
Chapter 2, Section 7, Theorem 2.7.13). □ 



17.7 Exercises 

Exercise 17.1 (The Distribution of Re(Z) and \Z\). Let the CRV Z be uniformly dis- 
tributed over the unit disc {z £ C : \z\ < 1}. 

(i) What is the density of its real part He(Z)? 
(ii) What is the density of its magnitude \Z\? 

Exercise 17.2 (The Density of Z 2 ). Let Z be a CRV of density fz{-)- Express the density 
of Z 2 in terms of fz{-)- 
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Exercise 17.3 (The Conjugate of a Proper CRV). Must the complex conjugate of a proper 
CRV be proper? 

Exercise 17.4 (Product of Proper CRVs). Show that the product of independent proper 
complex random variables is proper. Is the assumption of independence essential? 

Exercise 17.5 (Sums of Proper CRVs). Show that the sum of independent proper complex 
random variables is proper. Is the assumption of independence essential? 



Exercise 17.6 (Transforming Complex Random Vectors). Let Z be a complex n-vector 
of PDF /z(-)- Let W = <?(Z), where g: T> — ► 1Z is a one-to-one function from an open 
subset V of C n to Tl C C n . Let the mappings u, v: R 2n -* R n be defined for x,y <E R n 
as 

u: (x,y) i-> Re(#(x + iy)) and v: (x,y) i-> Im(c/(x + iy)). 

Assume that g is differentiable in T> in the sense that for all j, £ £ {1, . . . , n} the partial 
derivatives 

du"'(x, y) 9u"'(x, y) dv^'(x.,y) dv^'(x.,y) 

exist and are continuous in T>, and that they satisfy 

du^'(x.,y) dv^'(x.,y) du^'(x,y) dv^'(x,y) 

dxW ~ dyW a ' U dyW ~ dx& ' 

where a denotes the j-th component of the vector a. Further assume that the determi- 
nant of the Jacobian matrix 



/ £> M (1) (x,v) - ^ (1) (x,y) Ou (1) (x,y) ^ (1) (x,y) \ 

Q x m + ' Q X W ■■■ d X M + ' Q x (n) 



det g'(z) = det 



V 



du 



O), 



x,y) .dv 



O), 



x,yj 



9u (n) (x,y) .<% (n) (x,y) 



Q X m d x w ' ar» dxw 

is at no point in T> zero. Show that the density /w(") 0I W is given by 

/z(z) 



/w (w) = 



|det 5 '(z)P 



I{w € TC}. 



z = g !(w) 



Exercise 17.7 (The Cauchy-Schwarz Inequality Revisited). Let (z7^) be a discrete-time 
WSS CSP. Show that (17.61) implies 



\Cov[Z e ,Z e ,]\ < Var[Zi] : 



(.' ez. 



Exercise 17.8 (On the Autocovariance Function of a Discrete-Time CSP). Show that 
if Kzz is the autocovariance function of a discrete-time WSS CSP, then for every ncN, 
the matrix 

/ Kzz(0) Kzz(l) ... Kzz(n-1)\ 

Kzz(-l) Kzz(0) ... Kzz(n-2) 



\K zz (-n+l) Kzz{-n + 2) 
is positive semidefinite. 



Kzz(0) / 
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Exercise 17.9 (Reversing the Direction of Time). Let Kzz be the autocovariance function 
of some discrete-time WSS CSP (Z„). For every v £ Z define Y v — Z- v . Show that the 
time-reversed CSP \Yu) is also a WSS CSP, and express its autocovariance function Kyy 
in terms of Kzz- 

Exercise 17.10 (The Sum of Autocovariance Functions). Show that the sum of the 

autocovariance functions of two discrete-time WSS complex stochastic processes is the 
autocovariance function of some discrete-time WSS CSP. 

Exercise 17.11 (The Real Part of an Autocovariance Function). Let Kzz be the au- 
tocovariance function of some discrete-time WSS CSP (Z v ). Show that the mapping 
m I— » Re (Kzz (jnj) is the autocovariance function of some real SP. Is this also true for the 
mapping m i— > Im(Kz.z(ra))? 

Exercise 17.12 (Rotating a WSS CSP). Let (Z t ) be a zero-mean WSS discrete-time CSP, 
and let a e C be fixed. Define the new CSP (We) as We — a e Z e for every IgZ. 

(i) Show that if \a\ — 1 then (We) is WSS. Compute its autocovariance function, 
(ii) Does your answer change if a is not of unit magnitude? 



Chapter 18 

Energy, Power, and PSD in QAM 

18.1 Introduction 

The calculations of the power and of the operational power spectral density in 
QAM are not just repetitions of the analogous PAM calculations with complex 
notation. They contain two new elements that we shall try to highlight. The 
first is the relationship between the power (as opposed to energy) in passband and 
baseband, and the second is the fact that the energy and power in transmitting 
the complex symbols {Ce} are only related to expectations of the form EfC^C^,]; 
they are uninfluenced by those of the form E[C^CV]. 

The signal (X(t), t G Kj (or X for short) that we consider is given by 

X{t) = 2Re(X BB {t)e' 27Tf - t ), teR, (18.1) 

where 

X BB (t) = AY,Cig(t-£T s ), teR. (18.2) 

£ 

Here A > is real; the symbols {Cp} are complex random variables; the pulse 
shape g is an integrable complex function that is bandlimited to W/2 Hz; T s is 
positive; and f c > W/2. The range of the summation will depend on the modes 
we discuss. 

Our focus in this chapter is on X's energy, power, and operational PSD. These 
quantities are studied in Sections 18.2-18.4, albeit without all the fine mathemat- 
ical details. Those are provided in Sections 18.5 & 18.6, which are recommended 
for the more mathematical readers. The definition of the operational PSD of com- 
plex stochastic processes is very similar to the one of real stochastic processes 
(Definition 15.3.1). It is given in Section 18.4 (Definition 18.4.1). 



18.2 The Energy in QAM 

As in our treatment in Chapter 14 of PAM, we begin with an analysis of the energy 
in transmitting K IID random bits D\,...,Dy.. We assume that the data bits 
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are mapped to N complex symbols Ci, . . . , Cn using a (K, N) binary-to-complex 
block-encoder 

enc: {0, 1} K -» C N 

of rate 



(18.3) 



K 

N 



bit 



complex symbol 



The transmitted signal is then: 

X(t) = 2Re(X BB (t)e^^) 

= 2Re(AY j C l g{t-£T s )e i27r ^ t J, te 
where the baseband representation of the transmitted signal is 

N 

x BB (t) = AY,C e g(t-n s ), teR. 

Our interest is in the energy E in X, which is defined by 



E^E 



X 2 (t)dt 



(18.4) 
(18.5) 

(18.6) 
(18.7) 



Our assumption that the pulse shape g is bandlimited to W/2 Hz implies that 
for every realization of the symbols {Ci}, the signal X BB (-) is also bandlimited 
to W/2 Hz. And since we assume that f c > W/2, it follows from Theorem 7.6.10 
that the energy in the passband signal X(-) is twice the energy in its baseband 
representation X BB (-), i.e., 



E = 2E 



|*bb(*)| dt 



(18.8) 



We can thus compute the energy in X(-) by computing the energy in X BB (-) and 
doubling the result. The energy of the baseband signal can be computed in much 
the same way that the energy was computed in Section 14.2 for PAM. The only 
difference is that the baseband signal is now complex: 

X BB {t)\ 2 dt 

N 2" 

Aj2c e g{t-n s ) dt 

1=1 

AY,C e g(t-£T s )j (A^CV<?(i-£'T s )] dt 

i=i ' ^ i'=i ' _ 

g(t-n s )g*(t-eX)dt 

t=Lf=L -°° 

N N 

A 2 Y, E E^C?'] R zz(( £ ' ~ £ ) J S )' ( 18 - 9 ) 



N N 



i=\e.'=\ 
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where R gg is the self-similarity function of the pulse shape g (Definition 11.2.1), 
i.e., 



R gg( r ) 



g(t + T)g*(t)dt, re 



(18.10) 



This expression for the energy in Xbb(-) is greatly simplified if the symbols {Cf} 
are of zero mean and uncorrelated: 



|X BB (i)| 2 d* =A 2 ||g|| 2 2 ]TE[|C £ | 2 ], 

{E[C e C* i ,} = E[\C e \ 2 }l{£ = £ / }, £,^e{l,...,N}), (18.11) 
or if the time shifts of the pulse shape by integer multiples of T s are orthonormal 

N 



/oo '^ 

\X BB (t)\ 2 dt =A 2 ^E[|Q| 2 ], 
'CO J fit 



g(t-n s )g*(t-t'T s )dt = !{£ = £'}, £,£' e {1, . . . ,N} . (18.12) 



Since g is an integrable function that is bandlimited to W/2 Hz, it is also energy- 
limited (Note 6.4.12). Consequently, by Proposition 11.2.2 (iv), we can express 
the self-similarity function Rgg in (18.9) as the Inverse Fourier Transform of the 
mapping / i-> \g{f)\ 2 : 



/CO 
|0 (/) |2 e i2,r/T d/) re 
-co 



(18.13) 



With this representation of R gg we obtain from (18.9) an equivalent representation 
of the energy as 



|*bb(*)| d* 



coo N N 



W J2J2^[C t C* e ,]e'^ ( '-^\g(f)\ 2 df. (18.14) 

J —GO /! 1 /!, 1 



fcU'=l 



Using (18.8), (18.9), and (18.14) we obtain: 

Theorem 18.2.1 (Energy in QAM). Assume that A > 0, that T s > 0, that g: R -» 

C is an integrable signal that is bandlimited to W/2 Hz, and that f c > W/2. Then 
the energy E in the QAM signal X(-) of (18.5) is given by 



N N 

E = 2A 2 ]T ]T E tW] R g g((£' - £)T S ) 

t=\ v=\ 

r oo N N 



(18.15) 



= 2A 2 fEE E [ C ' C ^ iM(( '" ()Ts l9(/)| 2 d/, (18.16) 

whenever all the complex random variables C\, . . . , Cm are of finite variance 



E[|C,| 2 ]<oo, e=i, 



N. 



(18.17) 
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In analogy to PAM, we define the energy per bit Eb by 



K 
and the energy per complex symbol E s by 



£b = | (18.18) 



N 
Using Theorem 18.2.1, we obtain 

N N 

A 2 
N 



E s 4- (18-19) 



E s = |r A2 E E E ^^*'] M(^' - *)T.) (18.20) 



e=i £'=i 

N N 



a2 r EE mcftjw-w- w)\ 2 af. (18.21) 



Notice that, as promised, only terms of the form E[C^C^,] influence the energy; 
terms of the form E[CiC'g>] do not appear in this analysis. 



18.3 The Power in QAM 

In order to discuss the power in QAM we must consider the transmission of an 
infinite sequence of complex symbols (Ci). To guarantee convergence, we shall 
assume that the pulse shape g — in addition to being an integrable signal that is 
bandlimited to VV/2 Hz — also satisfies the decay condition 

i^ i + ivV- ' teR (18 - 22) 

for some a,/3 > 0. Also, we shall only consider the transmission of bi-infinite 
sequences (Ci) that are bounded in the sense that there exists some 7 > such 
that every realization of \Ct) satisfies 

|Q|<7, leZ. (18.23) 

As for PAM, we shall treat three different scenarios for the generation of \CA . In 
the first, we simply ignore the mechanism by which the sequence (CA is generated 
and assume that it forms a wide-sense stationary complex stochastic process. In 
the second, we assume bi-infinite block encoding. And in the third we relax the 
statistical assumptions and consider the case where the time shifts of g by integer 
multiples of T s are orthonormal. In all these cases the transmitted waveform is 
given by 

X(t) = 2Re(X BB (t)e i27r/c *), teR, (18.24) 

where 

00 

X BB {t) = A J2 C e g{t-IT S ), teR. (18.25) 
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It is tempting to derive the power in X(-) by using the complex version of the 
PAM results of Section 14.5 to compute the power in Xbb(') and then doubling 
the result. This turns out to be a valid approach, but its justification requires some 
work. The difficulty is that the powers are defined as 



and 



lim -— E 
T-*oo 2T 



lim -— E 
T-+00 2T 



X 2 (t)dt 



\X BB (t)\ 2 dt 



and — Theorem 7.6.10 notwithstanding- 



2T 



X 2 {t)dt 



7^2 — E 
r 2T 



/ |X BB (*)| 2 dt 



(18.26) 



The reason we cannot claim equality in (18.26) is that t i— > X(t)I{|i| < T} is not 
bandlimited around f c , so Theorem 7.6.10, which relates energies in passband and 
baseband, is not applicable. Nevertheless, it turns out that the limits as T — » oo of 
the RHS and the LHS of (18.26) do agree: 



(18.27) 



lim — E 

T-^oo 2T 


f X 2 {t)dt 


= 2 lim — E 

T-^oo 2T 


/ \X BB {t)\ 2 dt 





Thus, the power in a QAM signal is, indeed, twice the power in its baseband 
representation. This is stated more precisely in Theorem 18.5.2 and is proved in 
Section 18.5. With the aid of (18.27) we can now readily compute the power in 
QAM. 



18.3.1 (C e ) Is Zero-Mean and WSS 

We next ignore the mechanism by which the symbols {CA are generated and merely 
assume that they form a zero-mean WSS discrete-time CSP of autocovariance 
function Kcc- 

E[C £ ] = 0, l£Z, (18.28a) 



E[Q +m C;] = K cc (m), m,£eZ. 



(18.28b) 



The calculation of the RHS of (18.27) is very similar to the analogous computation 
in Section 14.5.1 for PAM. The only difference is that here -Xbb(') is complex. As 
in Section 14.5.1, we begin by computing the energy in a length- T s interval: 



r+T s 



|*bb(*)| 



d/. 



A z 



J2 C e g(t-£T S 



^=-oo 



d/. 
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d/. 



A 



A 



— — OO t' — — OO 

r+T s oo oo 



I— — oo r— — oo 



m— — oo £'— — oo 
r+T s oo 



— — OO £' — — 00 

oo 

=— oo £ f — — oo 

oo oo 

E E E[C// +m C;,] ff (t-(^ + m)T B )5*(t-n B )d* 

— — oo V— — oo 

oo oo 

AM J] K cc (m) J] ff (t-(^ + m)T B )5*(t-^T B )d< 

r m— — oo I' — — oo 

oo oo „ t _|_t s -£'T s 

A 2 V K cc (m) V / 9(t'-mT s )9*(t')dt' 

m— — oo ^' — — oo s 

00 />CO 

A 2 J] K cc (m) 9*(t')g(t'-mJ s )dt' 

rn— — OQ ~°° 

oo 

A 2 J] K cc (m)R; g (mT s ), 



(18.29) 



m— — oo 



where we have substituted £' + m for £ (fourth equality) and t' for £ — £'J S (sixth 
equality) . 

As in the analogous analysis for real PAM signals, we lower-bound the energy of 
-^"bb(0 in the interval [— T, +T] by 



and upper-bound it by 



2T 



2T' 



T + T s 



|*bb(*)| di 



T + T s 



\X BB {t)\ 2 dt 



so, by the Sandwich Theorem, 



lim -— E 
T^oo 2T 



|*bb(*)| di 



T s 



r+T s 



|*bb(*)| dt 



It thus follows from (18.30) and (18.29) that the power Pbb hi -Xbb(') is 

A 2 °° 



Pbb = y- ^ Kcc(m) Rg g (mT 6 

m— — OO 
■s J— oo „ 



m e 



-i27r/mT B 



l5(/)| 2 d/, 



(18.30) 

(18.31) 
(18.32) 



where the second equality follows from (18.13). 

Since the power in passband is twice the power in baseband, we conclude: 
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Theorem 18.3.1. Let the QAM SP (X(t)) be given by (18.24) & (18.25), where 
A, T s , g, W, and f c are as in Theorem 18.2.1. Further assume that g satisfies the 
decay condition (18.22) and that the discrete-time SP (CA is bounded in the sense 
of (18.23). If (C e ) satisfies (18.28), then (X(t)) is a measurable SP, 



and 



(18.33) 




lim — E 

T-^oo 2T 


f X 2 {t)dt 


= ^T £ K cc (m) e - i2 ^|5(/)| 2 d/. 



(18.34) 

Proof. Follows by combining (18.27) (Theorem 18.5.2) and Theorem 14.6.4 (which 
extends to the case where the pulse shape and the symbols are complex). □ 

18.3.2 Bi-lnfinite Block-Mode 

The second scenario we consider is when (Ce) is generated, as in Section 14.5.2, by 
applying a binary-to-complex block-encoder enc: {0, 1} K — » C N to bi-infinite IID 
random bits (DA. As in Section 14.5.2, we assume that the encoder, when fed IID 
random bits, produces symbols of zero mean. 

By extending the results of Section 14.5.2 to complex pulse shapes and complex 
symbols, we obtain that the power in -Xbb(') is given by: 



1 



Pbb = ^t E 



A^C £5 (t-iT s ) 



A 



2 ,oo N N 



J— CO I) -1/1/ -i 



(18.35) 
(18.36) 



fci f =i 



Using the relationship between power in baseband and passband (18.27) and using 
the definitions of E (18.8) and of E s (18.19), we obtain: 

Theorem 18.3.2. Under the assumptions of Theorem 18.3.1, if the symbols [CA 
are generated from IID random bits (DA in bi-infinite block-mode using the encoder 
enc(-), where enc(-) produces zero-mean symbols when fed IID random bits, then 
[X{t)j is a measurable SP, and 



(18.37) 



where the energy per symbol E s is defined in (18.19) and is given by (18.20) or 
(18.21). 



lim -— E 

T^oo 2T 


1 X 2 (t)dt 





314 Energy, Power, and PSD in QAM 

Proof. Follows from Theorem 18.5.2 and by noting that Theorem 14.6.5 also ex- 
tends to the case where the pulse shape and the symbols are complex. □ 

18.3.3 Time Shifts of Pulse Shape Are Orthonormal 

We finally address the third scenario where the time shifts of the pulse shape by 
integer multiples of T s are orthonormal. This situation is very prevalent in Digital 
Communications and allows for significant simplifications. In this setting we denote 
the pulse shape by </>(•) and state the orthonormality as 



(/>(t-lJ s )(t>*(t-e'J s )dt = !{£ = £'}, t,£'eZ. (18.38) 

The transmitted signal (X(t), t 6 M) is thus given as in (18.24) but with 

DC 

X BB (t) = A J^ C £( f>{t-IT S ), te», (18.39) 

fc-oo 

where we assume that the discrete-time CSP \CA satisfies the boundedness con- 
dition (18.23) and that the complex pulse shape </>(•) satisfies the orthogonality 
condition (18.38) and the decay condition 

W)\< 1+ |^ Tg |l+a ' fGM ' ( 18 - 40 ) 

for some a, (3 > 0. 

Computing the power in \X BB (t), t € M.) using Theorem 14.5.2, which easily 
extends to the complex case, we obtain from (18.27): 

Theorem 18.3.3. Let the SP (X(t), tel) be given by 



X(t) = 2Re( A Y] C*</>(i-fr s )e i2,r/ <=M, fei, (18.41) 



where A > 0; T s > 0; the pulse shape <fi '■ R ~ * C is an integrable function that is 
bandlimited to W/2 Hz, is Borel measurable, satisfies the orthogonality condition 
(18.38), and satisfies the decay condition (18.40); the carrier frequency f c satisfies 
/ c > W/2 > 0; and where the CSP (CA satisfies the boundedness condition (18.23). 
Then yX(f), teK) is a measurable stochastic process, and 



1 


[ f J 2 


2A 2 


lim — E 


/ X 2 {t)dt 




T^oo 2T 


T 



li m -J— y E\\C e \ 2 }, 
L^oo 2L + 1 ^ Ll ' J 



(18.42) 



whenever the limit on the RHS exists. 
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18.4 The Operational PSD of QAM Signals 

We shall compute the operational PSD of the QAM signal (X{t), t £ R) (18.24) by 
relating it to the operational PSD of the complex signal (Xbb(£), JEM) (18.25) 
and by then computing the operational PSD of the latter using techniques similar 
to the ones we employed in Chapter 15 in our study of the operational PSD of real 
PAM signals. But first we must define the operational PSD of complex stochastic 
processes. The definition is very similar to that for real stochastic processes (Defi- 
nition 15.3.1), but there are two issues to note. The first is that we do not require 
that the operational PSD be a symmetric function, and the second is that we allow 
for filters of complex impulse response. 

Definition 18.4.1 (Operational PSD of a CSP). We say that a CSP (Z(t), t € R) 
is of operational power spectral density Szz if \Z(t), t € R) is a measurable 
CSP; 1 the mapping Szz '■ R — » R is integrable; and for every integrable complex- 
valued function h: R — > C the average power of the convolution of (Z(t), t € R) 
and h is given by 

/oo 
S ZZ {f)\h{f)\ 2 df. (18.43) 

-oo 

By Lemma 15.3.2 (i) the PSD is unique: 

Note 18.4.2 (The Operational PSD Is Unique). The operational PSD of a CSP 
is unique in the sense that if a CSP is of two different operational power spectral 
densities, then the two must be indistinguishable. 

The relationship between the operational PSD of the real QAM signal [X(t)j 
(18.24) and of the CSP (X BB (t)) (18.25) turns out to be very simple. Indeed, 
subject to the conditions that are made precise in Theorem 18.6.6, if the baseband 
CSP (X BB (i)) is of operational PSD S B b, then the real QAM SP (X(t)) is of 
operational PSD Sxx, where 



(18.44) 



This result is proved in Section 18.6 and relies heavily on the fact that g is band- 
limited to W/2 Hz and that f c > W/2. Here we shall only derive it heuristically 
and then see how to apply it. 

Recalling the definition of the operational PSD of a real SP (Definition 15.3.1), we 
note that in order to derive (18.44) we need to show that its RHS is an integrable 
symmetric function and that 

/oo 
|M/)| 2 S BB (|/|-/c)d/, (18.45) 

-oo 




X A complex stochastic processes is said to be measurable if its real and imaginary parts are 
measurable real stochastic processes. 
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whenever h: K — » M. is integrable. The integrability of / i— > Sbb(|/| — /c) follows 
directly from the integrability of Sbb( - )- The symmetry is obvious because the RHS 
of (18.44) depends on / only via |/|. Our plan for computing the power in X * h 
is to first use the results of Section 7.6.7 to express the baseband representation 
of X * h in the form Xbb * hg B , where hg B is the baseband representation of the 
result of passing h through a unit-gain bandpass filter of bandwidth W around 
the carrier frequency f c . Using the relationship between power in passband and 
baseband, this will allow us to express the power in X * h as twice the power in 
Xbb * n BB- Expressing the power in the latter using the operational PSD Sbb(0 
of Xbb will allow us to complete the calculation of the power in X * h. 

Before executing this plan, we pause here to heuristically argue that, loosely speak- 
ing, the condition that g is bandlimited to W/2 Hz implies that we may assume 
that 

Sbb(/) = 0, |/| > y. (18.46) 

For a precise statement of this result, see Proposition 18.6.3 in Section 18.6.2. The 
intuition behind this statement is that, since g is bandlimited to W/2 Hz, in some 
loose sense, all the power of the signal Xbb is contained in the band |/| < W/2. 
To heuristically justify (18.46), we shall show that if Sbb(') is an operational PSD 
for (X BB (t)), then so is the mapping / i-» S B b(/)I{|/| < W/2}. This follows by 
noting that for every h: K — » C in Ci 



Power in Xbb * h = Power in (t i— ► A V^ Cg9{t — £T S ) ) * h 

eel 

= Power in 1 1-> A^ C t (g * h)(t - £T S ) 

lei 

= Power in 1 1-> A J^ d ((g * LPF w/2 ) * h) (t - £T S ) 
eei. 

= Power in t ^ A ^ C e (g • (h * LPF w/2 )) (t - £T S ) 

£ez 

= Power in (t >-» A J^ C t g(t - £T S )) • (h * LPF w/2 ) 



eel, 

DC 



S B B(/)|A(/)I{|/|<W/2}| 2 d/ 

(S BB (/)I{|/|<W/2})|/ l (/)| 2 d/, 

from which the result follows from the uniqueness (to within indistinguishability) 
of the operational PSD (Note 18.4.2). Here the first equality follows from the 
definition of Xbb (18.25); the second because convolving a PAM signal of pulse 
shape g (in our case complex) with h is tantamount to replacing the pulse shape g 
with the new pulse shape g*h (see the derivation of (15.16) in Section 15.4 which 
extends verbatim to the complex case); the third because, by assumption, g is 
bandlimited to W/2 Hz; the fourth by the associativity of convolution (see Theo- 
rem 5.6.1, which, strictly speaking, is not applicable here because LPFw/2 is not 
integrable); the fifth because replacing the pulse shape g by g * (h * LPF w /2) is 



18.4 The Operational PSD of QAM Signals 317 

tantamount to convolving the PAM signal with (h*LPFw/2); the sixth from our 
assumption that Sbb(') is an operational PSD for Xbb (and by ignoring the fact 
that h * LPF w /2 need not be integrable); and the seventh by trivial algebra. 

Having established (18.46), we are now ready to compute the power in X • h. 
Using the results of Section 7.6.7 we obtain that for every integrable h: R — » R, 
the baseband representation of X * h is given by Xbb * h BB where h BB : K — > C is 
the baseband representation of the result of passing h through a unit-gain bandpass 
filter of bandwidth W around the carrier frequency f c : 

^bb(/) = M/ + /c)I{|/|<W/2}, /el. (18.47) 

And since the power in passband is twice the power in baseband, we conclude that 

Power in X * h = 2 Power in Xbb * h BB 

/oo 
s B B(/)|/4 B (/)| 2 d/ 
-oo 

/>CO 

= 2/ S B B(/)|M/ + / c )| 2 I{|/|<W/2}d/ 



— OO 



2/ S B B(/)|M/ + / C )| 2 d/ 



— DC 



2/ S BB (/-/c)|M/)| 2 d/ 



OO 



S B B(/-/c)|&(/)| 2 d/+ / S BB (/-/c)|M-/)| 2 d/ 



-oo 
oo 



SBB(/-/c)|ft(/)rd/+ / S BB (-f'-f c )\h(f')\'df 

) J — oo 

(Sbb(/ - /c) + S BB (-/ - /c)) \Hf)\ 2 df 
S B B(|/|-/c)|M/)| 2 d/, 



where the first equality follows because the power in passband is twice the power 
in baseband; the second because Xbb is of operational PSD Sbb( - )j the third by 
(18.47); the fourth by (18.46); the fifth by changing the integration variable to 
f = f + fc', the sixth because h is real so its Fourier Transform must be conjugate- 
symmetric; the seventh by changing the integration variable in the second integral 
to /' = — /; the eighth by the linearity of integration; and the final equality by 
(18.46) and the assumption that f c > W/2. This establishes (18.45) and thus 
concludes the proof of (18.44). 

We next apply (18.44) to calculate the operational PSD of QAM in two scenarios: 
when the complex symbols (CA form a bounded, zero-mean, WSS, CSP and when 
they are generated in bi-infinite block-mode. 
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18.4.1 (Ci) Zero-Mean WSS and Bounded 

We next use (18.44) to derive the operational PSD of QAM when the discrete-time 
CSP \CA is of zero mean and of autocovariance function Kcc; see (18.28). To use 
(18.44) we first need to compute the operational PSD of the CSP Xbb- This is 
straightforward. As in Section 15.4.2, we note that Xbb * h has the same form as 
(18.25) with the pulse shape g replaced by g * h. Consequently, by substituting 
the FT of g * h for the FT of g in (18.32), 2 we obtain that 

» 2 />00 OO 

Power in X BB * h =— / Y, K cc(m) e~^f mT -\g{f)\ 2 \h(f)\ 2 df (18.48) 
and the operational PSD of Xbb is thus 

Sbb(/) = t - Y Kcc(m)e- i2ir * mT -|5(/)| 2 , /€»■ (18-49) 



m— — oo 



S.XX 


(/) = 


A 2 
T 


OO 

E 

rn— — oc 


K cc (m)e 


2tt(I/|- 


-/c)i»T, 


|$(l/l-. 


^ 


/ 


el. 



This is the complex analog of (15.21). From (18.49) and (18.44) we now obtain: 

Theorem 18.4.3. Under the assumptions of Theorem 18.3.1, the operational PSD 
of the QAM signal (X(t), t£t) is given by 



(18.50) 



Proof. The justification of (18.44) is in Theorem 18.6.6. A formal derivation of 
the operational PSD of (-Xbb(^); i€K) can be found in Section 18.6.5. We draw 
the reader's attention to the fact that the proof that we gave for the real case in 
Section 15.5 is not directly applicable to the complex case because that proof relied 
on Theorem 25.14.1 ( Wiener-Khinchin) , which we prove in Section 25.14 only for 
real WSS stochastic processes. 3 □ 

Figure 18.1 depicts the relationship between the pulse shape g and the operational 
PSD of the QAM signal for the case where Kcc( TO ) = l{m = 0} for every m € Z. 



18.4.2 The Operational PSD of QAM in Bi-lnfinite Block-Mode 

The operational PSD of QAM in bi-infinite block-mode can also be computed 
using (18.44). All we need is the operational PSD of (Xbb(O)j which can be 
computed from (18.36) as follows. As in Section 15.4.2, we note that Xbb *h has 
the same form as (18.25) with the pulse shape g replaced by g*h. Consequently, 



2 We are ignoring here the fact that g * h need not satisfy the required decay condition. 
3 The extension to the complex case is not as trivial as one might think because the real and 
imaginary parts of a WSS complex SP need not be WSS. 
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Figure 18.1: The relationship between the Fourier Transform of the pulse shape 
£/(•) and the operational PSD of a QAM signal. The symbols (CA are assumed to 
be of zero mean and uncorrelated. 



by substituting the FT of g * h for the FT of g in (18.36), we obtain that 
Power in Xbb * h 



2 N N 






f{e '- i)Js \g(f)\ 2 )\kf)\ 2 df, (18.51) 



and the operational PSD of Xbb is thus 



A 



2 N N 



S bb(/)= ^EE E ^^] ei2 ^'"" )Ts |5(/)| 2 , /€ 



(18.52) 



This is the complex analog of (15.23). (But note that, in our present case, Sbb(' 
need not be a symmetric function.) From (18.52) and (18.44) we now obtain: 
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Theorem 18.4.4 (Operational PSD of QAM in Bi-lnfinite Block-Mode). Under 
the assumptions of Theorem 18.3.2, the operational PSD Sxx of the QAM signal 
(X(t), t € M.) is given for every f € R by 



A 



2 N N 



«/) = ^EE EIQQje^l/l-W^ m _ /c 



NT 



«=i^'=i 



(18.53) 



Proof. The justification of (18.44) is in Theorem 18.6.6, and a formal derivation 
of the operational PSD of (Xbb(^)) is given in Section 18.6.5. □ 
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In this section we formulate conditions under which (18.27) holds, i.e., under which 
the power in passband is twice the power in baseband. We first extend the Triangle 
Inequality (4.14) to stochastic processes. 

Proposition 18.5.1 (Triangle Inequality for Stochastic Processes). Let (X(t)j 
and (Y(t)J be (real or complex) measurable stochastic processes, and let a < b be 
arbitrary real numbers. Suppose further that 



f \X(t)\ 2 dt\,E\f 



\Y(t)\ dt 



< oo. 



(18.54) 



Then 



\X(t)\ dt 



< E 



\Y(t) \ dt 



\X(i) + Y(t)\ dt 



\X{t)\ dt 



\Y(t)\ dt 



(18.55) 



This also holds when a is replaced with — oo and/or b is replaced with +oo. 



Proof. Replace all integrals in the proof of (4.14) with expectations of integrals. 

□ 



We can now state the main result of this section relating power in passband and 
baseband. 
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Theorem 18.5.2. Let T s , g, W, and f c be as in Theorem 18.2.1 and, addition- 
ally, assume that g satisfies the decay condition (18.22) and that the CSP \C() is 
bounded in the sense of (18.23). Then the condition 



lim -— E 

T^oo 2T 



is equivalent to the condition 



J ^C/S(t-fT B ) 



At 



(18.56) 



lim -— E 
T^oo 2T 



/ (2Re(Y,Ct9(t-n s )e^^ 
-'-TV ^ eel. 



d/. 



2P. 



(18.57) 



The rest of this section is dedicated to proving this theorem. To simplify the 
notation we begin by showing that it suffices to prove the result for the case where 
T s = 1. If T s > is not necessarily equal to 1, then we define for every tel, 

g(t) = g{tT s ), 
W=WT S , 

/c Jc 's; 

and note that g is bandlimited to W/2 Hz if, and only if, g is bandlimited to WT s /2 
Hz; that 

(/c>W/2)o(/ c >W/2); 

and that g satisfies the decay condition (18.22) if, and only if, 



lff(*)l< 



ft 



l + \t\ l + a 



te 



By defining r = t/l s we obtain that 



i r J 2 1 r T/J « 



dr 



so the power in the mapping t t— > ^2C'e9(t — £J S ) is the same as in the mapping 
t \— > Yl Ct g(r — t). Similarly, 



i f | T ^2Re(^C,5(t-i?T s ) e i2 ^)) d* 



2(t/t s ; 



T/T s 



T/T s 



2Re( Y,C e g(T-£)e^f^)) dr 



so the power in the mapping i i — ^ 2 Re(J^ Cg g(t — £T s ) e l27r ^°*) is the same as in the 
mapping r i— > 2 Re ( J^ Ceg(t — £) e l27r / cT J . Thus, if we establish that the inequality 
f c > W/2 implies that the power in the baseband signal r i— ► J^ Cg g(r — £) is equal 
to half the power in r i— » 2 Re(X^ ^ <K^ — ^) e l27r ^ cT ) , then it will also follow that 
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the inequality f c > W/2 implies that the power in t <— > J^ Cg 9(t — £l s ) is equal to 
half the power in t ^ Re(J2i d 9(t - £T S ) e i27r ^*) . 

Having established that it suffices to prove the theorem for T s = 1, we assume for 
the remainder of this section that T s = 1, so the decay condition (18.22) can be 
rewritten as 

igwi< 1+ L +a > ieK - ( i8 - 58 ) 

As in the proof of Theorem 14.5.2, we shall simplify notation and assume that — in 
calculating power as the limiting ratio of the energy in the interval [— T, T] to the 
length of the interval — T is restricted to the positive integers. The justification is 
identical to the one we gave in proving Theorem 14.5.2; see (14.52). 

We shall find it convenient to introduce an additional subscript "w" to indicate 
"windowing." Thus, if we define Xbb( - ) as 

X BB {t) = Y J Ci9{t-£), ten, 
then its windowed version Xbb, w (-) is given by 

-XBB,w(i)=$^s(t-*)i{|i|<T} ) ten. 

Similarly -Xpb, w (") is the windowed version of the SP 

and g£ ]W is the windowed version of 

ge:ti->g(t-£), leZ. (18.59) 

We can now express the power in baseband as the limit, as T tends to infinity, of 
E ||Xbb,w|| 2 /(2T), and the power in passband as the limit of E ||XpB lW || a /(2T). 
Note that, since the function I{-} is real- valued, 

*pb,w(*) = 2 Re(X BB , w {t) e i27r/ "*) , teR. (18.60) 

But (18.60) notwithstanding, the energy in Xpe. w need not be twice the en- 
ergy in Xbb.w because the signal Xbb,w — unlike its unwindowed version Xbb — is 
not bandlimited. It is time-limited, and as such cannot be bandlimited (Theo- 
rem 6.8.2). 

The difficulty in proving the theorem is in relating the energy in Xpb, w to the 
energy in Xbb,w and, specifically, in showing that the difference between half the 
energy in Xpb jW an d the energy in Xbb.w, when normalized by 2T, tends to zero. 
Aiding us in this is the following lemma relating the energy in passband to the 
energy in baseband for signals that are not bandlimited. 
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Lemma 18.5.3. Let z be a complex energy-limited signal that is not necessarily 
bandlimited, and consider the real signal x: 1 1— > 2Re(z(£) e' 2 '*'/ *), where f c >0is 
arbitrary. Then, 



2 . 1 
2 



N s -V^e) <-||x|| 2 <(||z|| 2 + V2 e ) , (18.61) 



where 

6 2 = [ C \z(f)\ 2 df. (18.62) 



Proof. Expressing the FT of x in terms of the FT of z, we obtain that for every 
/6l outside a set of frequencies of Lebesgue measure zero, 

£(/)I{/>0} 

= Hf ~ /c) I{/ > 0} + z* (-/ - / c ) I{/ > 0} 

= z(/ - / c ) + r (-/ - / c ) I{/ > 0} - z(f - f c ) I{/ < 0}. (18.63) 

We next consider the integral over / of the squared magnitude of the LHS and of 
the RHS of (18.63). Since x is real, its FT is conjugate-symmetric so, by Parseval's 
Theorem, the integral of the squared magnitude of the LHS of (18.63) is | ||x|| 2 . 
The integral of the squared magnitude of the first term on the RHS of (18.63) is 
given by \\z\\ g . Finally, the integral of the squared magnitude of each of the last 
two terms on the RHS of (18.63) is e 2 and, since they are orthogonal, the integral 
of the squared magnitude of their sum is 2e 2 . The result now follows from the 
Triangle Inequality (4.14). □ 

Applying Lemma 18.5.3 with the substitution of xbb,w for z and of xpb, w for x 
we obtain upon noting that f c > W/2 that, in order to establish the theorem, it 
suffices to show that the "out-of-band energy" term 

24 I i . .-n 1 2 



e 2 = / zbb,w(/) d/ (18.64) 

J\f\>W/2 

satisfies 

lim -e 2 = 0, (18.65) 

T-^oo T 

with the convergence being uniform. That is, we need to show that e 2 /Tis upper- 
bounded by some function of a, f3, 7, and T that converges to zero as T tends to 
infinity with a, (3, 7 held fixed. Aiding us in the calculation of the out-of-band 
energy is the following lemma. 

Lemma 18.5.4. Let x be an energy -limited signal and let W > 0. 

(i) Lf u is any energy-limited signal that is bandlimited to W/2 Hz, then 



i 



l/l > W/2 



|x(/)| 2 d/<||x-u|| 2 . (18.66) 
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(ii) In particular, 

f |x(/)| 2 d/<||xf 2 . (18.67) 

J\f\>W/2 

Proof. Part (ii) follows from Parseval's Theorem. Part (i) follows by noting that 
if u is an energy-limited signal that is bandlimited to W/2 Hz, then the Fourier 
Transforms of x and x — u are indistinguishable for frequencies / that satisfy 
|/| > W/2. Consequently 

\Hf)\ 2 df= f \x(f)-u(f)\ 2 df 

|/|>W/2 J\f\>W/2 

<||x-u|| 2 , 
where the inequality follows by applying Part (ii) to the signal x — u. □ 

To prove (18.65) fix some integer v > 2 and express xbb,w as 

XBB.w = So,w + Si )W + S2, W ) (18.68) 

where 

So,w = ^2 c iS£,w, (18.69) 

o<\e\<T-v 

si, w = X] c iSe,w, (18.70) 

T-v<\£\<T+v 

S2,w = ^2 C ^w, (18.71) 

T+i^<\£\<oo 

are of corresponding out-of-band energies 

e 2 = [ |s K , w (/)| 2 d/, K = 0,1,2. (18.72) 

J\f\>W/2 

Note that by (18.64), (18.68), and the Triangle Inequality 

e 2 < (e + ei + e 2 ) 2 . (18.73) 

Since the integer v > 2 is arbitrary, it follows from (18.73) that, to establish (18.65) 
and to thus complete the proof of the theorem, it suffices to show that for every 
fixed integer v > 2, 

limie 2 = 0, (18.74) 

T— >oo I 

lim -e 2 = 0, (18.75) 

T— >oo T 

and that 

lim { lim -ef) = 0. (18.76) 

We thus conclude the theorem's proof by establishing (18.74), (18.75), and (18.76). 
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We begin with the easiest, namely (18.75). To establish (18.75) we recall the 
definition of ei (18.72) & (18.70) and use the Triangle Inequality to obtain 

/ r \ 1/2 

ei< E / Mf,w(/)| 2 d/ 

T-»<\e\<T+^ J \f\> w / 2 J 

<7 J2 \\Si.w\\ 2 

T-;/<|£|<T+y 

<4 7 H|g|| 2 , (18.77) 

where the second inequality follows from (18.23) and from Lemma 18.5.4 (ii), and 
where the final inequality follows because windowing can only reduce energy so 
||&,w|| £ < WtttWe = l|g|| 2 - Inequality (18.77) establishes (18.75). 

Having established (18.75), we next turn to proving (18.74). The proof is quite 
similar except that, instead of using Part (ii) of Lemma 18.5.4, we use Part (i) with 
the substitutions of g^ w for x and of gi for u to obtain 

/ |kw(/)| 2 df <||g,. w -g,|| 2 2 , £eZ. (18.78) 

J\f\>W/2 

We further upper-bound the RHS of (18.78) using the decay condition (18.58) as 



/oo 
|^(/)| 2 I{|i|>T}dt 
-OO 



— OO 

— T /*oo 

\ g (t-£)\ 2 dt+ / \g(t-£)\ 2 dt 

-OO Jl 

— J—£ /»oo 

\g(r)\ 2 dt + / | 5 (r)| 2 dr 

-oo JJ-t 

-T-t a2 /-oo o2 

- '.oo w^ dT+ Lw^ dT 



oo o2 

2 



/3 2 
^ 2 I M^+2^ dr 



V ' lfl<T, 



l + 2a(T- \i\) 1 + 2a ' 



to obtain 



|kw(/)| 2 d/) <X^I - * /2+a , K| < T. (18.79) 

|/|>w/2 / V l + 2a (T- |^l) v + 

Using (18.72), (18.69), (18.79), (18.23), and the Triangle Inequality we thus obtain 

1/2 

«o< E M( I |%w(/)| 2 d/' 

0<|/|<T-* WI>W/2 



< 



2 7 2 /3 2 v^ 1 



£ 



l + 2a ^ (T-|£|)V2+a 
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< 2 



-T-i 



<-jm-Y, 



l +2a ^ (T-£) 1 / 2 + a 



£-~i (1/2A 



2 7 2 /3 2 



1 + 2a ^ £i/2H 



2/92 



2 7 2 /3 



d£ 



1 + 2a J l/ _ 1 ^1/2+" 

V^T^a 172 ^-^-!) 1 / 2 -") if a* 1/2 
27/?(lnT-ln(i/-l)) if a =1/2 



(18.80) 



where the inequality in the first line follows from (18.72) and from the Triangle 
Inequality; the inequality in the second line from (18.79); the inequality in the 
third line by counting the term £ = twice; the equality in the fourth line by 
changing the summation variable to £ = T — £; the inequality in the fifth line from 
the monotonicity of the function £ i— > ^-V 2 -*^ which implies that 



2-1/2-a < 



-i e 1 ^ 



d£; 



and where the final equality on the sixth line follows by direct calculation. Inequal- 
ity (18.80) combines with our assumption that a is positive to prove (18.74). 

We now conclude the proof of the theorem by establishing (18.76). To that end, we 
begin by using Lemma 18.5.4 (ii) and the fact that S2 iW is zero outside the interval 
[— T, T] to obtain 



el < 



|s2,w(*)| di. 



(18.81) 



We next upper-bound the RHS of (18.81) using the boundedness of the symbols 
(18.23) and the decay condition (18.58): 

|s 2 ,w(<)| = E C£9l,w{t) 

T+u<\t\<oo 

<7 E \g(t-i)\i{\t\<T} 



r+v<\e\<oo 



< 



7 E 



P 



l+u<\l\«yo 

< 7 V — 

T+v<\t\<oo |K- 



\t-£\ 1 + a 



i{\t\<r\ 

I+^{I*I<T} 



< 



7 



E 







T+j/<|£|<oo 



m-v 1 



+a 



27/3 E 



*-" (£-T)^ 

=T+i/+l v ' 



18.6 A Formal Account of the PSD in Baseband and Passband 327 



2 7/3 y J- 

£ 1+c 



<2 7 /3/ r i-a d£ 

= 2 ^u~ a , (18.82) 

a 

where the equality in the first line follows from the definition of S2. w (18.71); the 
inequality in the second line from the Triangle Inequality for Complex Numbers 
(2.12), the boundedness of (Ct) (18.23), and from the definition of ge (18.59); the 
inequality in the third line from (18.58); the inequality in the fourth line because 
|£ — CI > | |C| — ICI whenever ^,( 6 K; the inequality in the fifth line because for 
|i| > T the LHS is zero and the RHS is positive, and because for |i| < T we have 
that \£\ — \t\ > \£\ — T throughout the range of summation; the equality in the 
sixth line from the symmetry of the summand and from the assumption that T is 
an integer; the equality in the seventh line by changing the summation variable to 
£ = £ — !; the inequality in the eighth line from the monotonicity of the function 
£ i— > £ _1 ~ Q , which implies that 

< / 7T-^d£; 



£ i + a ~ Ji-i e 

and the final equality in the ninth line by evaluating the integral. 
It follows from (18.82) and (18.81) that 

el < 21-^y- 2a (18.83) 



and hence that 



lim -e 2 < ^ — v , v > 2, 



T-»oo T 
which proves (18.76). 

18.6 A Formal Account of the PSD in Baseband and Passband 

In this section we justify the derivations of Section 18.4. 

18.6.1 On Limits of Convolutions 

We begin with a lemma that justifies the swapping of infinite summation and 
convolution. As a corollary we establish conditions under which feeding a (real or 
complex) PAM signal of pulse shape g to a stable filter of impulse response h is 
tantamount to replacing its pulse shape g with the new pulse shape g * h. 

Lemma 18.6.1. Let Si,S2, • • • be a sequence of measurable functions from M. to C 
satisfying the following two conditions: 
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1) The sequence is uniformly bounded in the sense that there exists some positive 
number a^ such that 



|*/(t)|<foo, (teK, £=1,2,...). 



(18.84) 



2) The sequence converges to some function s uniformly over compact sets in 
the sense that for every fixed £ > 



lim sup \s{t) — St{t)\ = 0. 
e ^°°\t\<£, 



Then for every h££], 



lim (s( *h)(t) = (s*h)(t), te 



(18.85) 



(18.86) 



Proof. Fix some epoch to € R and some h G Ci . We will show that for every 
e > there exists some 1_ € N (depending on e) such that 



|(s/*h)(to)-(s*h)(to)|<e, £ > L . 



(18.87) 



To that end note that our assumption that h is integrable implies that there exists 
some £ > such that 



\h(r)\dT< 



W\>i 



3er 



(18.88) 



And when we apply our assumption that the sequence Si,S2,... converges to s 
uniformly over compact sets to the compact interval [to — £, to +£L we obtain that 
there exists some Lq (depending on e, £q, an d £) such that 



|j sup |s(r) - S£(t)| < -, £ > L . 

t -S<r<«o+€ d 



(18.89) 



We can now derive (18.87) as follows: 
(s e *h)(t ) - (s*h)(£ ) 

st(t - r)/i(r)dr - / s(t - r)/i(r) dr 



s^(to - r)/i(r)dr - / s(t - r)/i(r) dr 



— OO 



s(i — t)Ii(t) dr 



|r|>« 



M*o - r)/i(r)dT 



M>« 



J / |«<(*o-r)-«(to-r)||fc(T)|dT 



<llh| 



\s(to — t) h(j)\ dr + / |s^(io — T ) M T )| dr 

|r|>? J|r|>C 



to-i<T<t +(, 



*(t)-«/(t)|) +2fJ 



|/»(r)|dr 



|r|>l 



< e, 



where the last equality follows from (18.88) and (18.89). 



□ 
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Corollary 18.6.2. If the sequence (CA is bounded in the sense of (18.23) and if the 
measurable function g satisfies the decay condition (18.22), then for every h G Ci 
and every epoch to £ R 

(^^Q0(i-£T s ))*hVio)=^Q(g*h)(i o -€T s ). (18.90) 

Proof. Follows by applying Lemma 18.6.1 to the functions 

L 

s L : tn J2 C e g{t-£T S ), L=l,2,... D 

£=-L 

18.6.2 On the Support of the Operational PSD of X B b 

We next prove that if the pulse shape g is bandlimited to W/2 Hz, then the 
operational PSD of Xbb is zero at frequencies outside the band [—W/2, W/2]. 
That is, we justify (18.46). 

Proposition 18.6.3. Assume that A, T s , g, W, and f c are as in Theorem 18.2.1 
and, additionally, that g satisfies the decay condition (18.22) and that the CSP 
(Ci) is bounded in the sense of (18.23). If the CSP (X BB (t), teR) of (18.25) ts 
of operational PSD Sbb(-); then Sbb(/) is zero for all \f\ > W/2 outside a set of 
lebesgue measure zero, and consequently 

/~S B B(/)l{|/|<y} 

is also an operational PSD for (Xbb^), (£ 1). 

Proof. We shall show that the proposition's hypotheses imply that if h € £j is 
such that h(f) = at all frequencies / satisfying |/| < W/2, then the power in 
Xbb *h is zero, irrespective of the values of h(f) at other frequencies. That is, we 
shall show that 



M/) = 0, |/| < W/2) => (Power in X BB *h = 0l, heC t . (18.91) 

Since Xbb is, by assumption, of operational PSD Sbb(0i it w ^ then follow from 
(18.91) that 

(/>(/) = 0, |/|<W/2)^(f°S B B(/)|M/)| 2 d/ = 0), he£j. (18.92) 

J — oo 

From (18.92) it is just a technicality to show that the nonnegative function Sbb(') 
must be zero at all frequencies |/| > W/2 outside a set of Lebesgue measure 
zero. Indeed, if, in order to reach a contradiction, we assume that Sbb(0 is not 
indistinguishable from the all-zero function in some interval [a, b], where a and b 
are such that W/2 < a < b, then picking h as an integrable function such that 
h(f) is zero for |/| < W/2 and such that h(f) = 1 for a < f < b would yield 
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a contradiction to (18.92). (An example of such a function h is the IFT of the 
shifted-trapezoid mapping 

1 if a < f < 6, 

if / < W/2 or / > b + {a - W/2), /el, 

1 - IJ-(°+fc)/2|-(«»-°)/2 oth e r wise, 

a—W/2 

which is a frequency shifted version of the function we encountered in (7.15) and 
(7.17).) The assumption that Sbb(') is not indistinguishable from the all-zero 
function in some interval [a, b] where a < b < —W/2 can be similarly contradicted. 

To complete the proof we thus need to justify (18.91). This follows from two 
observations. The first is that, by Corollary 18.6.2, for every h £ C 1 

Power in X BB * h = Power in t 1-* A^C £ (g*h)(i-ff s ). (18.93) 

The second is that, because g is an integrable function that is bandlimited to W/2 
Hz, it follows from Proposition 6.5.2 that 

/•W/2 

(g*h)(t)= / g(f) h(f) e i2 ^* d/, teR 

J -W/2 

and, in particular, 

[h(f) = 0, |/|<W/2)=>(g*h = o), he A. (18.94) 

Combining (18.93) and (18.94) establishes (18.91). □ 

18.6.3 On the Definition of the Operational PSD 

In order to demonstrate that (Z(t), t € K) is of operational PSD Szz, one has 
to show that (18.43) holds for every function h: R — > C in Ci (Definition 18.4.1). 
It turns out that it suffices to establish (18.43) only for functions that are in a 
subset of Li , provided that the subset is sufficiently rich. This result will allow 
us to consider only functions h of compact support. To make this result precise 
we need the following definition. We say that the set TL is a dense subset of Ci 
if TL is a subset of Ci such that for every h G Ci there exists a sequence hi, h.2, . . . 
of elements of TL such that Yvca^^^ ||h — h„|| t =0. An example of a dense subset 
of Ci is the subset of functions of compact support, where a function h : R — > C is 
said to be of compact support if there exists some A > such that 

h(t)=0, \t\ > A. (18.95) 

Lemma 18.6.4 (On Functions of Compact Support). 

(i) The set of integrable functions of compact support is a dense subset of Ci . 

(ii) If h is of compact support and z/g satisfies the decay condition (18.22) with 
parameters a,/3,T s > 0, then g*h also satisfies this decay condition with the 
same parameters a and T s but with a possibly different parameter (3' . 
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Proof. We begin with Part (i). Given any integrable function h (not necessarily 
of compact support) we define the sequence of integrable functions of compact 
support h l7 h 2 , . . . by h. v : t *— > /i(i)I{|i| < v\ for every v € N. It is then just a 
technicality to show that ||h — h. u \\ 1 converges to zero. (This can be shown using 
the Dominated Convergence Theorem because |/ij/(£)| < \h(t)\ for all t € R and 
because h is integrable.) 

We next prove Part (ii). Let g satisfy the decay condition (18.22) with the positive 
parameters a, /3,T S , and let A > be such that (18.95) is satisfied. We shall prove 
the lemma by showing that 

l (g * h)(t) ^ i + (ITO^ ' te% (18 - 96) 

where 

/3' = /?||h||, 2 1+ «(l + (2A/T s ) 1+ «). (18.97) 

To that end we shall first show that 

|(g*h)(i)| </?||h|| J; iet (18.98) 

and 

|(g^h)ft)l</?||h|| j 2 1 +" IT ^— - , |t|>2A. (18.99) 

We shall then proceed to show that the RHS of (18.96) is larger than the RHS of 
(18.98) for |i| < 2A and that it is larger than the RHS of (18.99) for |i| > 2A. 

Both (18.98) and (18.99) follow from the bound 

i-t+A 

|(g*h)(t)|= / g(T)h(t-T)dT 

Jt-A 

f-t+A 

< / |ff(r)||/»(t-r)|dT 

Jt-A 

< f ( sup \g(a)\)\h(t-T)\dT 

Jt-A K t-A<a-<t+A ' 

= \\v\\i ™ P \g{p)\ 

t-A<a<t+A 

as follows. Bound (18.98) simply follows by using (18.22) to upper-bound \g{t)\ 
by (3. And Bound (18.99) follows by using (18.22) to upper-bound \g(t)\ for |i| > A 
by /3/(l + ((\t\ — A)/T s ) 1+Q ), and by then upper-bounding this latter expression 
in the range \t\ > 2A by /32 1+ "/(l + {\t\/T s ) 1+a ) because in this range 

^K 1+a /^\ !+" 



- J \TJ V2 



> 2 -(i+«) +2 -(i+«)^y +a ; \ t \ 



t\ > 2A. 
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Having established (18.98) and (18.99) we now complete the proof by showing that 
the RHS of (18.96) upper-bounds the RHS of (18.98) whenever \t\ < 2A, and 
that it upper-bounds the RHS of (18.99) for \t\ > 2A. That the RHS of (18.96) 
upper-bounds the RHS of (18.98) whenever \t\ < 2A follows because 



(i\\h\\ 1 2^(l + (2A/T s y+« i 

^llhll^^Uhll,, |i|<2A. 



l + (|t|/T s )i+« 

And that the RHS of (18.96) upper-bounds the RHS of (18.99) whenever |i| > 2A 
follows because the term 1 + (2A/T S ) 1+Q is larger than one. □ 

Proposition 18.6.5. Assume that Ti. is a dense subset of Cj and that the (real or 
complex) measurable stochastic process (Z(t), t£l) is bounded in the sense that 
for some a^ 

\Z(t)\<o-oo, teR. (18.100) 

// S(-) is a nonnegative integrable function such that the relation 

/oo 
S(f)\h(f)\ 2 df (18.101) 

-oo 

holds for every h € Ti, then it holds for all h G Ci . 

Proof. Let h be an element of Ci (but not necessarily of Ti) for which we would 
like to prove (18.101). Since Ti is a dense subset of Ci, there exists a sequence 
hi, I12, . . . of elements of Ti 

Ken, iy = 1,2, ... (18.102) 

such that 

lim ||h-h„||, =0. (18.103) 

v — *oo 

We shall prove that (18.101) holds for h by justifying the calculation 

Power in Z * h = lim Power in Z * h^ (18.104) 

V — !-CO 

/oo 
S(/)|M/)| 2 d/ (18.105) 

-oo 

oo 

S(/)|M/)| 2 d/. (18.106) 

-oo 

The justification of (18.105) is that, by (18.102), each of the functions h„ is in Ti, 
and the proposition's hypothesis guarantees that (18.101) holds for such functions. 

The justification of (18.106) is a bit technical. It is based on noting that (18.103) 
implies (by Theorem 6.2.11 (i) with the substitution of h — h„ for x) that 

lim K(f) = h(f), /£l (18.107) 

u — >oo 

and by then using the Dominated Convergence Theorem to justify the swapping of 
the limit and integral. Indeed, (by Theorem 6.2.11 (i)) for every !/£N, the function 
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/ i— > S(/) hi,(f) is bounded by the function / i— > (sup^ ||h„|| 1 ) S(/), which is 
integrable because S(-) is integrable (by the proposition's hypothesis) and because 
the integrability of h and (18.103) imply that the supremum is finite as can be 
verified using the Triangle Inequality by writing h„ as h — (h — h„). 

We now complete the proof by justifying (18.104). Since Z^h^ = Z*h — Z*(h — h.„), 
it follows from the Triangle Inequality for Stochastic Processes (Proposition 18.5.1) 
that for every T > 



Z*h„(t) dt 



Z*h(£) dt 



\(Z*(h-h u ))(t)\ 2 dt 



< a/2T CToo ||h- h^H, , 



(18.108) 



where the second inequality follows from (18.100) using (5.8c). Upon dividing by 
V2T and taking the limit of T^ oo , it now follows from (18.108) that 



yPower in Z * h„ — v Power in Z • h 
from which (18.104) follows by (18.103). 



< (Too ||h - h v \\ t , 



D 



18.6.4 Relating the Operational PSD in Passband and Baseband 

We next make the relationship (18.44) between the operational PSD of X and the 
operational PSD of Xbb formal. 

Theorem 18.6.6. Under the assumptions of Proposition 18.6.3, if the complex 
stochastic process \X-B&(t), £ G R) of (18.25) is of operational PSD Sbb(0 in 
the sense that Sbb(') * s an integrable function satisfying that for every complex 



1 


\ r 


lim — lE 




r-^oc 2T 


[J-T 



(X BB *h c )(i) 



dt 



SBB(/)|M/)rd/, 



then the QAM real SP (X(t), (el) of (18.24) is of operational PSD 
Spb(/) = S B b(/-/c)+Sbb(-/-/c), f€R 



(18.109) 



(18.110) 



in the sense that Sps(-) is an integrable symmetric function such that for every 
real h r g Li 



1 


\ f T 


lim -=. E 




T^oo 2T 


U-j 



(X*h r )(t) 



di 



S PB (f)\h r (f)\ 2 df. 



(18.111) 



Proof. The hypothesis that Sbb(') is integrable clearly implies that Sps(-), as 
defined in (18.110), is integrable and symmetric. It remains to show that if (18.109) 
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holds for every complex h c G Ci , then (18.111) must hold for every real h r € Ci . 
Since the set of integrable functions of compact support is a dense subset of Ci 
(Lemma 18.6.4 (i)), it follows from Proposition 18.6.5 that it suffices to establish 
(18.111) for real functions h r that are of compact support. Let h r be such a 
function. The following calculation demonstrates that passing the QAM signal X 
through a filter of impulse response h r is tantamount to replacing its pulse shape g 
with the pulse shape consisting of the convolution of g with the complex signal 
r ^ e - i27r ^ T / lr (T): 



(X*h r )(t)= ( (j^2Re(X B B(T)e i2 ^ T )J*h r j(i) 

= 2Ref((r^XBB(T)e i2 ^)*h r )(i)" 

= 2Re(' e i2 ^ t (xBB*(r^e- i2 ^ T / lr (T))j(^) 

/ OO 

= 2Refe i27r ' fct A ^ C e (g * (r h-» e -' ,27r f" T /i r (r))j (t - U s 

= 2 Re (a J2 C e (g*h c )(t-lT s )e i2 *fA, (18.112) 



where the first equality follows from the definition of X in terms of Xbb! the second 
because h r is real (see (7.38) on the convolution between a real and a complex 
signal); the third from Proposition 7.8.1; the fourth from Corollary 18.6.2; and 
where the fifth equality follows by defining the mapping 

h c : t^ e-'^^hrit). (18.113) 

Note that by (18.113) 

M/) = M/ + /c), f€R. (18.114) 

It follows from (18.112) that X*h r has the form of a QAM signal with pulse shape 
g*h c . We note that, because g (by hypothesis) satisfies the decay condition (18.22) 
and because the fact that h r is of compact support implies by (18.113) that h c is 
also of compact support, it follows from Lemma 18.6.4 (ii) that the pulse shape 
g * h c satisfies the decay condition 

|(g*h c )(f)|< i + (| ^ Ts)1+a , teR (18.115) 

for some positive f3' . Consequently, we can apply Theorem 18.5.2 to obtain that 
the power of X * h r is given by 

OO 

Power in X * h r = 2 Power in t i— > A >J Ci (g • h c ) (t — £T S ) 

OO 

= 2 Power in ( 1 1-> A J^ C t 9{t - ll s )\ * h c 
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2 Power in (Xbb * h c ) 



2 



2/ S BB (/) h c (f) \'df 



CO 



2/ S B B(/)|M/ + /c)| 2 d/ 



— oo 



S B B(/-/c)|^(/)| 2 d/ 



— oo 



(Sbb(/-/c) + S B b(-/-/c)) \k(f)\ 2 df, (18.116) 

where the second equality follows from Corollary 18.6.2; the third by the definition 
of Xbb! the fourth because, by hypothesis, Xbb is of operational PSD Sbb(')j the 
fifth from (18.114); the sixth by changing the integration variable to / = / + / c ; 
and the seventh from the conjugate symmetry of h r (-). 

Since h r was an arbitrary integrable real function of compact support, (18.116) 
establishes (18.111) for all such functions. □ 

Corollary 18.6.7. Under the assumptions of Theorem 18.6.6, the QAM signal 
(X(t), t e E) is of operational PSD 

Sxx(/) = S B b(|/|-/c), /eE. (18.117) 

Proof. Follows from the theorem by noting that, by Proposition 18.6.3 and by the 
assumption that f c > W/2, 

Sbb(/ " fc) + Sbb(-/ " fc) = S BB (|/| - fc) 
at all frequencies / outside a set of frequencies of Lebesgue measure zero. □ 

18.6.5 On the Operational PSD in Baseband 

In the calculation of the operational PSD of the QAM signal yX(t)) via (18.44) 
(which is formally stated as Corollary 18.6.7) we needed the operational PSD of 
the CSP (-X"bb(£)) of (18.25). In this section we justify the calculations of this 
operational PSD that lead to Theorems 18.4.3 and 18.4.4. Specifically, we show: 

Proposition 18.6.8 (Operational PSD of a Complex PAM Signal). Let the CSP 

(Jbb(0i f 6 l) be given by (18.25), where A > 0, T s > 0, and where g is a 
complex Borel measurable function satisfying the decay condition (18.22) for some 
constants a, /3 > 0. 

(i) If (Ci) is a bounded, zero-mean, WSS CSP of autocovariance function Kcc, 
i.e., if it satisfies (18.23) and (18.28), then the CSP (X BB (i), t e E) is of 
operational PSD Sbb(') as given in (18.49). 

(ii) If {CA is produced in bi-infinite block-mode from IID random bits using an 
encoder enc: {0, 1} K — > C N that produces zero-mean symbols from IID ran- 
dom bits, then (Xbb(^), f £ R) is of operational PSD Sbb(-) as given in 
(18.52). 
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Proof. We have all the ingredients that are needed to justify our derivations of 
(18.49) and (18.52). All that remains is to piece them together. Let h be any 
complex integrable function of compact support. Then 



Power in X B b * h = Power in I (t i->Ay^Qg(t- £J S ) J * h 

^ £ez J 

= Power in t h-> A^ C £ (g * h)(i - £T S ), (18.118) 

lei 

where the first equality follows from the definition of Xbb (18.25), and where the 
second equality follows from Corollary 18.6.2. Note that by Lemma 18.6.4 (ii) the 
function g*h satisfies the decay condition (18.96) for some /?' > 0. 

To prove Part (i) we now employ Theorem 14.6.4 (which extends to the case where 
the pulse shape and the symbols are complex) with the pulse shape g*h to obtain 
from (18.118) that 

» 2 /-co oo 

Power in X BB * h =— / £ K cc (m) e - i2 -/™ T = \g(f)\ 2 \h(f)\ 2 df, (18.119) 

s J -°° m=-oo 

for every integrable complex h of compact support. It follows from the fact 
that the set of integrable functions of compact support is a dense subset of Ci 
(Lemma 18.6.4 (i)) and from Proposition 18.6.5 that (18.119) must hold for all 
integrable functions. Recalling the definition of the operational PSD (Defini- 
tion 18.4.1), it follows that (Xbb(^), t s K) is of operational PSD Sbb(") as 
given in (18.49). 

The proof of Part (ii) is very similar except that we compute the RHS of (18.118) 
using (18.36) with the substitution of g*h for the pulse shape. □ 

18.7 Exercises 

Exercise 18.1 (The Second Moment of the Square QAM Constellation). 

(i) Show that picking X and Y IID uniformly over the set in (10.19) results in X + \Y 
being uniformly distributed over the set in (16.19). 

(ii) Compute the second moment of the square 2v x 2v QAM constellation (16.19). 

Exercise 18.2 (Optimal Constellations). Let C denote a QAM constellation, and define 
for every z £ C the constellation C — {c — z : c G C}. 

(i) Relate the minimum distance of C to that of C. 
(ii) Relate the second moment of C' to that of C. 
(iii) How would you choose z to minimize the second moment of CI 

Exercise 18.3 (The Power in Baseband Is Real). Show that the RHS of (18.29) is real. 
Which properties of the autocovariance function Kcc an d of the self-similarity func- 
tion R gg are you exploiting? 
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Exercise 18.4 (7r/4-QPSK). In QPSK or 4-QAM the data bits are mapped to complex 
symbols (Ci) which take value in set {±1 ± i} and which are then transmitted using the 
signal [X{t)j defined in (18.24). Consider now 7r/4-QPSK where, prior to transmission, 
the complex symbols (Ci ) are rotated to form the complex symbols 

C t = aCt, leZ, 
where a = e l7r ' . The transmitted signal is then 

2ARe( J2 Ce9(t-£J s )e l27Tf " t Y (£l. 



Compute the power and the operational PSD of the 7r/4-QPSK signal when (Ce) is a zero- 
mean WSS CSP of autocovariance function Kcc- Compare the power and operational 
PSD of 7r/4-QPSK with those of QPSK. How do they compare when the symbols (Ce) 
are IID? 

Hint: See Exercise 17.12. 

Exercise 18.5 (The Bandwidth of the QAM Signal). Formulate and prove a result anal- 
ogous to Theorem 15.4.1 for QAM. 

Exercise 18.6 (Bandwidth and Power in PAM and QAM). Data bits (Dj) are generated 
at rate Rb bits per second. 

(i) The bits are mapped to real symbols using a (K, 1M) binary-to-reals block-encoder 
of rate K/N bits per real symbol. The symbols are mapped to a PAM signal 
of pulse shape <j> whose time shifts by integer multiples of T s are orthonormal 
and whose excess bandwidth is rj. Find the bandwidth of the transmitted signal 
(Definition 15.3.4). 

(ii) Repeat for the bandwidth around the carrier frequency f c in QAM when the bits 
are mapped to complex symbols using a (K,N) binary-to-complex block-encoder 
of rate K/N bits per complex symbol. (As in Part (i), the pulse shape is of excess 
bandwidth 77.) 

(iii) Show that if we express the rate p of the block-encoder in both cases in bits per 
complex symbol, then in the former case p = 2K/N; in the latter case p — K/N; 
and in both cases the bandwidth can be expressed as the same function of Rb, p, 
and rj. 

(iv) Show that for both PAM and QAM the transmitted power is given by 

9 

provided that the energy per symbol E s and the rate p are computed in both cases 
per complex symbol. 

Hint: Exercise 18.5 is useful for Part (ii). 

Exercise 18.7 (Operational PSD of Differential PSK). Let the bi-infinite sequence of IID 
random bits (Dj, j £ Z) be mapped to the complex symbols (Ci, £ £ Z) as follows: 

C t+ i = Ct exp(i^(4D M + 2D 3e+1 + D M +a)) , t = 0, 1,2, . . . 

C e = C e+1 expf-i^(4D 3(? + 2D 3e+1 + D 3e+2 )) , i = . . . , -2, -1, 
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where Co is independent of {Dj) and uniformly distributed over the set 

l,e s ,e * ,e * ,...,e 7 » |. 

Find the operational PSD of the QAM signal under the assumptions of Section 18.3 on 
the pulse shape. 

Exercise 18.8 (PAM/QAM). Let D l , . . . , D k be IID random bits. These bits are mapped 
by a mapping </?qam : {0, 1} — > C n to the complex symbols C\, . . . , C n , which are then 
mapped to the QAM signal 

X Q AM(t; D 1 ,...,D k ) = 2ARe(j2 Ce ^Qam " ^T s , QA m) e' 2 * ut \ t e R, 

^ e=i ' 

where the time shifts of 0qam by integer multiples of T s ,qam are orthonormal. 
Define the real symbols X\ , . . . , X 2n by 

X 2e _ l = Re(C e ), X 2e = Im(CV), £e{l,...,n} 

and the corresponding PAM signal 

In 

Apam(£; Di, . . . , Dfc) = A./__, Xi <^pam(* — ^T SjP am), t 6 R, 
i=i 

where </>pam is real and its time shifts by integer multiples of T Si pam are orthonormal. 

(i) Relate the expected energy in Xqam to that in Xpam- 
(ii) Relate the minimum squared distance 

min / (x QAM (t;di,. . . ,d k ) - X QAM (t;d'i,. . . ,d' k )) dt, 

(d 1 ,...,d k )^(d' 1 ,...,d' k )J_ 00 \ - ') 

to 



/X-pAM\t; di, . . . ,d k ) — .XpAMlt; di, . . . , d k ) 



Exercise 18.9 (The Operational PSD is Nonnegative). Show that if the CSP (Z(t), t e R) 

is of operational PSD Szz, then Szz(f) must be nonnegative outside a set of frequencies 
of Lebesgue measure zero. 

Hint: See Exercise 15.5. 



Chapter 19 

The Univariate Gaussian Distribution 

19.1 Introduction 

In many communication scenarios the noise is modeled as a Gaussian stochastic 
process. This is sometimes justified by invoking a Central Limit Theorem, which 
demonstrates that many small independent disturbances add up to a stochastic 
process that is approximately Gaussian. Another justification is mathematical 
convenience: while Gaussian processes may seem daunting at first, they are actually 
well understood and often amenable to analysis. Finally, particularly in wireline 
communications, the Gaussian model is justified because it leads to robust results 
and to good engineering design. For other scenarios, e.g., fast-moving wireless 
mobile communications, more intricate models are needed. 

Rather than starting immediately with the definition and analysis of Gaussian 
stochastic processes, we shall take the more moderate approach and start by first 
discussing Gaussian random variables. Building on that, we shall later discuss 
Gaussian random vectors in Chapter 23, and only then introduce continuous-time 
Gaussian stochastic processes in Chapter 25. 

19.2 Standard Gaussian Random Variables 

We begin with a special kind of Gaussian: the standard Gaussian. 

Definition 19.2.1 (Standard Gaussian). We say that the random variable W is a 
standard Gaussian or that it has a standard Gaussian distribution, if its 

density function fw(') is given by 



1 _„£ 
fw(w) = —7=e 2 , we 
V 2tt 



(19.1) 



This density is depicted in Figure 19.1. For this definition to be meaningful, the 
RHS of (19.1) had better be a valid density function, i.e., be nonnegative and 
integrate to one. This is indeed the case. In fact, the RHS of (19.1) is positive, 
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Figure 19.1: The standard Gaussian density function 



and it integrates to one because, as we next show, 

e~ w2 / 2 dw = v/&F. 



This integral can be verified by computing its square as follows: 



e 2 d«) 



e 2 d w / e 2 dv 



J — oo 


^ — OO 


/>00 /"OO Q t 
/ / ™ 2 +v 2 


/ / e 2 dwdw 


J — oo J — oo 


/ / re"^ di^dr 


./0 J-ir 


/•°° r 2 
27T / re "2" dr 


io 


27r f-e-'- 2 /2) 


OO 


V / 






(19.2) 



= 2tt, 

where the first equality follows by writing a 2 as a times a; the second by writing 
the product of the integrals as a double integral over R 2 ; the third by changing 
from Cartesian to polar coordinates: 

w = r cos ip, v = rsiniy9, r > 0, —it < if < w, 

dw dv = r dr dip; 

the fourth because the integrand does not depend on ip; the fifth because the 
derivative of — e _r I 2 is r e~ r ' 2 ; and where the final equality follows by direct 
evaluation. 

Note that the density of a standard Gaussian random variable is symmetric (19.1). 
Consequently, if W is a standard Gaussian, then so is — W. This symmetry also 
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establishes that the expectation of a standard Gaussian is zero. The variance of a 
standard Gaussian can be computed using integration by parts: 



1 



/2tt 



: C 2 (\ W 



1 



' 2ir j _ <v 
1 



/2tt 
1 



tr | e 2 ) dw 

dw 



>2t: J- 



-we 



e 2 d w 



dw 



where the last equality follows from (19.2). 

19.3 Gaussian Random Variables 

We next define a Gaussian (not necessarily standard) random variable as the result 
of applying an affine transformation to a standard Gaussian. 

Definition 19.3.1 (Centered Gaussians and Gaussians). We say that a random 
variable X is a centered Gaussian or that it has a centered Gaussian distri- 
bution if it can be written in the form 



X = aW 



(19.3) 



for some deterministic a G R and for some standard Gaussian W . We say that 
the random variable X is Gaussian or that it has a Gaussian distribution if 



X = aW + b 
for some deterministic a, 6 G R and for some standard Gaussian W . 



(19.4) 



Note 19.3.2. We do not preclude a from being zero. The case a = leads to X 
being deterministically equal to b. We thus include the deterministic random vari- 
ables in the family of Gaussian random variables. 

Note 19.3.3. The family of Gaussian random variables is closed with respect to 
affine transformations: if X is Gaussian and a, f3 G R are deterministic, then 
aX + (3 is also Gaussian. 



Proof. Since X is Gaussian, it can be written as X = aW 
standard Gaussian. Consequently 

aX + (3 = a{aW + b) + (1 

= (aa)W + (ab + (3), 

which has the form a'W + b' for some deterministic a'b' G R. 



where W is a 



□ 



If (19.4) holds, then the random variables on its RHS and LHS must have the same 
mean. The mean of a standard Gaussian is zero, so the mean of the RHS of (19.4) 
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is b. The LHS is of mean E[X], and we thus conclude that in the representation 
(19.4) the deterministic constant b is uniquely determined by the mean of X, and 
in fact, 

b=E[X}. 

Similarly, since the variance of a standard Gaussian is one, the variance of the RHS 
of (19.4) is a 2 . And since the variance of the LHS is Var[X], we conclude that 

a 2 = Var[X] . 

Up to its sign, the deterministic constant a in the representation (19.4) is thus also 
unique. 

Based on the above, one might mistakenly think that for any given mean \x and 
variance a 2 there are two different Gaussian distributions corresponding to 

aW + fi, and - crW + /x, (19.5) 

where W is a standard Gaussian. This, however, is not the case: 

Note 19.3.4. There is only one Gaussian distribution of a given mean and variance. 

Proof. This can be seen in two different ways. The first is to note that the two 
representations in (19.5) lead to the same distribution, because the standard Gaus- 
sian W has a symmetric distribution, so aW and —aW have the same distribution. 
The second is based on computing the density of aW + fi and showing that it is a 
symmetric function of a; see (19.6) ahead. □ 

Having established that there is only one Gaussian distribution of a given mean \x 
and variance a 2 , we denote it by 

AA(/i,a 2 ) 

and set out to study its density. Since the distribution does not depend on the 
sign of a, it is customary to require that a be nonnegative and to refer to it as the 
standard deviation. Thus, a 2 is the variance and a is the standard deviation. 
If a 2 = 0, then the Gaussian distribution is deterministic with mean \x and has 
no density 1 If cr 2 > 0, then the density can be computed from the density of 
the standard Gaussian distribution as follows. If X ~ A/"(/i, cr 2 ), then X has the 
same distribution as y, + aW , where W is a standard Gaussian, because both X 
and /j, + aW are of mean fj, and variance a 2 (W is zero-mean and unit- variance); 
both are Gaussian (Note 19.3.3); and Gaussians of identical means and variances 
have identical distributions (Note 19.3.4). The density of X is thus identical to the 
density of \x + aW . The density of the latter can be computed from the density 
of W (19.1) to obtain that the density of a 7V(/z,<7 2 ) Gaussian random variable of 
positive variance is 

1 (*-M) 2 

e ^?^, xeR. (19.6) 

Vfr^ 2 

This density is depicted in Figure 19.2. To derive the density of jj, + aW from 



'Some would say that the density of a deterministic random variable is given by Dirac's Delta, 
but we prefer not to use generalized functions in this book. 
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fjb — a /i \x + a 
Figure 19.2: The Gaussian density function with mean fi and variance a . 



that of W, we have used the fact that if X = g(W), where g(-) is a deterministic 
continuously differentiable function whose derivative never vanishes (in our case 
g(w) = /i + aw) and where W is of density fw(') ( m our case (19.1)), then the 
density fx ( ■ ) of X is given by: 



fx(x) 



if for no £ is x = g(£), 

v \?j£)\fw({,) if C satisfies x = g{£), 



(19.7) 



where g'(£) denotes the derivative of g(-) at ^. (For a more formal multivariate 
version of this fact see Theorem 17.3.4.) 

Since the family of Gaussian random variables is closed under deterministic affine 
transformations (Note 19.3.3), it follows that if X ~ Af((J., cr 2 ) with a 2 > 0, then 
(X — [i)j<J is also a Gaussian random variable. Since it is of zero mean and of 
unit variance, it follows that it must be a standard Gaussian, because there is only 
one Gaussian distribution of zero mean and unit variance (Note 19.3.4). We thus 
conclude that for a 1 > and arbitrary \x £ K, 



(19.8) 



Recall that the Cumulative Distribution Function Fx{-) of a RV X is defined 

for x € IR as 




F x {x) =Pr[X <x], 

fx(Z)dS, 



where the second equality holds if X has a density function fx{)- If W is a 
standard Gaussian, then its CDF is thus given by 



F w {w) 



w 1 _^ 

—== e 2 d£, we 
-oo v27r 
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Figure 19.3: Q(a) is the area to the right of a under the standard Gaussian density 
plot. Here it is represented by the shaded area. 



There is, alas, no closed-form expression for this integral. To handle such expres- 
sions we next introduce the Q-function. 



19.4 The Q-Function 



The Q-function maps every a G 1 to the probability that a standard Gaussian 
exceeds it: 

Definition 19.4.1 (The Q-Function). The Q-function is defined by 



Q(a) 



1 



'2ir J a 



,-r/2 



d£, a e 



(19.9) 



For a graphical interpretation of this integral see Figure 19.3. 

Since the Q-function is a well-tabulated function, we are usually happy when we can 
express answers to various questions using this function. The CDF of a standard 
Gaussian W can be expressed using the Q-function as follows: 



F w (w) = Pr[W < w] 

= 1 - Pr[W > w] 

= l - Q(w), w e 



(19.10) 



where the second equality follows because the standard Gaussian has a density, 
so Pr[VF = w] = 0. Similarly, with the aid of the Q-function we can express the 
probability that a standard Gaussian W lies in some given interval [a,b]: 

Pr[a < W < b] = Pv\W > a] - Pr\W > b] 
= Q{a) - Q(6), a < b. 
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More generally, if X ~ AM/U, cr 2 ) with it > 0, then 

Pr[a < X < b] = Pr[A > a] - Pr[X > b], a<b 

Pr 



Pr 
Q 



X — (i a — (i 



a 
a — /i 

a 



X — fi b — fj, 



Q 



b — fi 



a a 

a <b, a > 0), 



a > 

(19.11) 



where the last equality follows because (X — \x)jo is a standard Gaussian; see 
(19.8). Letting b tend to +oo in (19.11), we obtain the probability of a half ray: 



PrLY > a] = gf^^V a > 0. 



And letting a tend to — oo we obtain 



Pr[X<6] = 1- q(- — ^), o>0. 



(19.12a) 



(19.12b) 



The Q-function is usually only tabulated for nonnegative arguments, because the 
standard Gaussian density (19.1) is symmetric: if W ~ A/"(0, 1) then, by the sym- 
metry of its density, 

Pv[W > -a] = Pv[W < a] 

= l-Pr[W>a], aeR. 



Consequently, as illustrated in Figure 19.4, 

Q(o) + fi(-o) = 1, ael, 



(19.13) 



and it suffices to tabulate the Q-function for nonnegative arguments. Note that, 
by (19.13), 

2(0) = -. (19.14) 



Q(a) = 


1 / ,7r / 2 a 2 


a > 0. 


71" ./O 



An alternative expression for the Q-function as an integral with fixed integration 
limits is known as Craig's formula: 



(19.15) 



This expression can be derived by computing a two-dimensional integral in two 
different ways as follows. Let X ~ A/"(0, 1) and Y ~ A/"(0, 1) be independent. 
Consider the probability of the event U X > and Y > a" where a > 0. Since the 
two random variables are independent, it follows that 

PrLY > and Y > a] = PrLY > 0] Pr[y > a] 
1 



= 2 Q ^ 



(19.16) 



346 



The Univariate Gaussian Distribution 





Figure 19.4: The identity Q(a) + Q{-a) = 1. 
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Figure 19.5: Use of polar coordinates to compute jQ(a) 



where the second equality follows from (19.14). We now proceed to compute the 
LHS of the above in polar coordinates centered at the origin (Figure 19.5): 



Pr[X > and Y > a] 



OO />00 -I 9 7 

— e 2 
Ja 27T 
/2 /-oo 



dy dx 



— e r2/2 rdrdip, a > 



2tt 



2tt 



tt/2 /*oo 



7r/2 



did(/5 



e 2si » 2 v dy>, a > 0, 



(19.17) 



where we have performed the change of variable t = r 2 /2. The integral represen- 
tation (19.15) now follows from (19.16) & (19.17). 

We next describe various approximations for the Q-function. We are particularly 
interested in its value for large arguments. 2 Since Q{a) is the probability that 
a standard Gaussian exceeds a, it follows that linio^oo Q(a) = 0. Thus, large 
arguments to the Q-function correspond to small values of the Q-function. The 
following bounds justify the approximation 



Q(a) 



1 



\/2ira 2 



: e 2 



a» 1. 



(19.18) 



Proposition 19.4.2 (Estimates for the Q-function). The Q-function is bounded 

by 

i / 1 \ i 

(19.19) 



V 27ra 2 



' e-« 2 /2(l-4) <Q{a)<^L^er a2 f\ «>0 



V 27ra 2 



In Digital Communications this corresponds to scenarios with low probability of error. 
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and 



Q(a)< l^° 2/2 , «>0. 



(19.20) 



Proof. The proof of (19.19) is omitted (but see Exercise 19.3). Inequality (19.20) 
is proved by replacing the integrand in (19.15) with its maximal value, namely, its 
value at <p = ir/2. We shall see an alternative proof in Section 20.10. □ 



19.5 Integrals of Exponentiated Quadratics 

The fact that (19.6) is a density and hence integrates to one, i.e. 

1 



-oo \Ph 



_(x- Ji y L 
: e 2,2 da; = 1, 



(19.21) 




can be used to compute seemingly complicated integrals. Here we shall show how 
(19.21) can be used to derive the identity 



(19.22) 



Note that this identity is meaningless when a < 0, because in this case the inte- 
grand is not integrable. For exmples, if a < 0, then the integrand tends to infinity 
as | a; | tends to oo. If a = and (3 ^ 0, then the integrand tends to infinity either 
as x tends to +oo or as x tends to — oo (depending on the sign of /3). Finally, if 
both a and (3 are zero, then the integrand is 1, which is not integrable. Note also 
that, by considering the change of variable u = — x, one can verify that the sign 
of (3 on the LHS of this identity is immaterial. 

The trick to deriving (19.22) is to complete the exponent to a square and to then 
apply (19.21): 



e- ax2+(ix dx 



exp 



exp 



d.r 



exp 



2(1/V2a) ! 

(*-g 2 | ^ 
2{1/V2a) 2 4a 

2{l/V2a)\ 



Ax 



dx 



efev/27r(l/%/2a) 2 



/3\ 2 



^(1/V2^) 2 ^V 2(1/^) 



exp 



e^J 27r(l/\/2a)' 



7T P± 
Q 



d.r 
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where the first equality follows by rewriting the integrand so that the term x 2 in 
the numerator is of coefficient one and so that the denominator has the form 2<r 2 
for a which turns out here to be given by a = l/v2a; the second follows by 
completing the square; the third by taking the multiplicative constant out of the 
integral; the fourth by multiplying and dividing the integral by \/2na 2 so as to 
bring the integrand to the form of the density of a Gaussian; the fifth by (19.21); 
and the sixth equality by trivial algebra. 



19.6 The Moment Generating Function 

As an application of (19.22) we next derive the Moment Generating Function 
(MGF) of a Gaussian RV. Recall that the MGF of a RV X is denoted by M x {-) 
and is given by 

M x {6)^E[e 0x ] (19.23) 

for all 8 G R for which this expectation is finite. If X has density fx{-), then its 
MGF can be written as 

M x (0)= f°° f x (x)e ex dx, (19.24) 

J — oo 

thus highlighting the connection between the MGF of X and the double-sided 
Laplace Transform of its density. 

If X ~ 7V(/i, a 2 ) where a 2 > 0, then 

/oo 
f x (x)e 9x dx 
-OO 

1 (*-M) 2 



-oo V2tt(t 2 



e ' 2„2' e 9x dx 



1 e -& e e( t+^ d£ 



-oo V27ra 2 

/•OO 2 

e-^+*« d£ 



V27TCT 2 J - x 



7T "% 

g4/(2 CT 2) 



72^2 V !/(2^ 2 
6»gM, 

where the first equality follows from (19.24); the second from (19.6); the third by 
changing the integration variable to £ = x — fx; the fourth by rearranging terms; 
the fifth from (19.22) with the substitution of l/(2cr 2 ) for a and of 9 for (3; and the 
final by simple algebra. This can be verified to hold also when a 2 = 0. Thus, 



(x~M(li,cj 2 ))^ (m x {9) 



gfl/x+iev 



(19.25) 
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19.7 The Characteristic Function of Gaussians 

19.7.1 The Characteristic Function 

Recall that the Characteristic Function $x( - ) °f a random variable X is defined 
for every zu € K by 



fx{x)e™ x dx, 



OC 



where the second equality holds if X has density fx ( ■ ) • The second equality demon- 
strates that the characteristic function is related to the Fourier Transform of the 
density function but, by convention, there are no 27r's, and the complex exponential 
is not conjugated. If we allow for complex arguments to the MGF (by performing 
an analytic continuation), then the characteristic function can be viewed as the 
MGF evaluated on the imaginary axis: 

$ x {w) = M x (iw), weM.. (19.26) 

Some of the properties of the characteristic function are summarized next. 

Proposition 19.7.1 (On the Characteristic Function). Let X be a random variable 
of characteristic function &x{-)- 

(i) If E[X n ] < oo for some n£N, then <&x( - ) is differentiable n times and the 
v-th moment of X is related to the v-th derivative of &x{) at zero via the 
relation 

1 d v $ x {vj) 



E[X V ] 



\ v dm 1 - 



, i/=l,...,n. (19.27) 



(ii) Two random variables of identical characteristic functions must have the 
same distribution. 



(Hi) If X and Y are independent random variables of characteristic functions 
$x( - ) an d ^y{), then the characteristic function <&x+y{-) of their sum is 
given by the product of their characteristic functions: 

(X &Y independent) =4- ($x+yM = *x(w) $y(ro), oel). (19.28) 



Proof. For a proof of Part (i) see (Shiryaev, 1996, Chapter II, § 12.3, Theorem 1). 
For Part (ii) see (Shiryaev, 1996, Chapter II, § 12.4, Theorem 2). For Part (iii) see 
(Shiryaev, 1996, Chapter II, § 12.5, Theorem 4). □ 
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For X ~ 7V(/i,cr 2 ) we obtain from (19.26) and (19.25) that 3 



X~A/(/i,a 2 ) ) => ($ x (u7) = e'^-s^ CT , we 



(19.29) 



19.7.2 Moments 

Since the standard Gaussian density decays faster than exponentially, it possesses 
moments of all orders. Those can be computed from the characteristic function 
(19.29) using Proposition 19.7.1 (i) by repeated differentiation. Using this approach 
we obtain that the moments of a standard Gaussian are 



E[W V ] 



1 x 3 x • • • x {v — 1) if v is even, 



10 if v is odd, 

We mention here in passing that 4 

{ 1 x 3 x • • • x (v — 1) if v is even, 



VF~7V(0, 1), 



(19.30) 



W~Af (0,1) 



(19.31) 



'l 2 (— 1 )/ 2 (^=i)! if lis odd, 
(Johnson, Kotz, and Balakrishnan, 1994a, Chapter 18, Section 3, Equation (18.13)). 



19.7.3 Sums of Independent Gaussians 

Using the characteristic function we next show: 

Proposition 19.7.2 (The Sum of Two Independent Gaussians Is Gaussian). The 

sum of two independent Gaussian random, variables is a Gaussian RV. b 

Proof. Let X ~ Af(fi x , <rl) and Y ~ Af(fi y , <r 2 ) be independent. By (19.29), 



$ Y (w) = e lro ^ 



— ^^ 7_ o"" 



w G 



3 It does require a (small) leap of faith to accept that (19.25) also holds for complex 8. This can 
be justified using analytic continuation. But there are also direct ways of deriving (19.29); see, for 
example, (Williams, 1991, Chapter E, Exercise E16.4) or (Shiryaev, 1996, Chapter II, Section 12, 
Paragraph 2, Example 2). Another approach is to express d<l?x( ro )/dTO as E[iXe lroX ] and to 
use integration by parts to verify that the latter's expectation is equal to — ro<J>x( ro ) an d to 
then solve the differential equation d<E>x(ro)/dro = — ro^x ( ro ) with the condition &x{0) = 1 to 
obtain that ln$x(^^) = — ~o u}2 - 

4 The distribution of \W\ is sometimes called half-normal. It is the positive square root of 
the central chi-squared distribution with one degree of freedom. 

5 More generally, as we shall see in Chapter 23, X + Y is Gaussian whenever X and Y are 
jointly Gaussian. And independent Gaussians are jointly Gaussian. 
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Since the characteristic function of the sum of two independent random variables 
is equal to the product of their characteristic functions (19.28), 

*x+r(ro) = $>x{w)$y{™) 



iro^+Ms,)-^ 2 (<t 2 +ct 2 ) 



TZJ G 



By (19.29), this is also the characteristic function of a Af[fi x + n y , o~ 2 + u 2 ) RV. 
Since the characteristic function of a random variable fully determines its law 
(Proposition 19.7.1 (ii)), X + Y must be Af(/J, x + n v , al + a 2 ^) . D 

Using induction one can generalize this proposition to any finite number of ran- 
dom variables: if X\, . . . ,X n are independent Gaussian random variables, then 
their sum is Gaussian. Applying this to ct\X\, . . . ,a n X n , which are independent 
Gaussians whenever X\, . . . ,X n are independent Gaussians, we obtain: 

Proposition 19.7.3 (Linear Combinations of Independent Gaussians). If the ran- 
dom variables X\, . . . , X n are independent Gaussians, and if a±, . . . , a n G M. are 
deterministic, then the RV Y = X^=i a £^£ * s Gaussian with mean and variance 

n 

E[y] = ^o / E[X / ] ) 

1=1 

n 

Var[y] = ^a?Var[^]. 

i=\ 

19.8 Central and Noncentral Chi-Square Random Variables 

We summarize here some of the definitions and main properties of the central and 
noncentral y 2 distributions and of some related distributions. We shall only use 
three results from this section: that the sum of the squares of two independent 
A/"(0, 1) random variables has a mean-2 exponential distribution; that the distri- 
bution of the sum of the squares of n independent Gaussian random variables of 
unit-variance and possibly different means depends only on n and on the sum of 
the squared means; and that the MGF of this latter sum has a simple explicit form. 

These results can be derived quite easily from the MGF of a squared Gaussian RV, 
an MGF which, using (19.22), can be shown to be given by 

(X~M(^<? 2 )) =» (m x ,(0) = =L= e~£ e ^d-^ fl) , 0<^). (19-32) 

With a small leap of faith we can assume that (19.32) also holds for complex 
arguments whose real part is smaller than 1/(2ct 2 ) so that upon substituting \w 
for 9 we can obtain the characteristic function 

(X~Af(fi,(T 2 ))=>($ X 2(zu) = e~^e^ 2 (i-^ 2 ->, roel). (19.33) 

\ v 1 — \2a 2 zv ' 
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19.8.1 The Central \ 2 Distribution and Related Distributions 

The central \ 2 distribution with n degrees of freedom is denoted by Xn 
and is defined as the distribution of the sum of the squares of n IID zero-mean 
unit-variance Gaussian random variables: 

X u ...,X n ~ IID JV(0, 1)) =► (X>*~X*)- ( 19 - 34 ) 

Using the fact that the MGF of the sum of independent random variables is the 
product of their MGFs and using (19.32) with /i = and a 2 = 1, we obtain that 
the MGF of the central x 2 distribution with n degrees of freedom is given by 



E [ e »x» 



" < -. (19.35) 



(1 - 26»)"/ 2 ' 2' 



Similarly, by (19.33) and the fact that the characteristic function of the sum of 
independent random variables is the product of their characteristic functions, (or 
by substituting \w for 6 in (19.35)), we obtain that the characteristic function of 
the central \ 2 distribution with n degrees of freedom is given by 



E r x 1 = (i-2 M ^ ' - eM - ^ 

Notice that for n = 2 this characteristic function is given by w <— » 1/(1 — \2w), 
which is the characteristic function of the mean-2 exponential density 

-e~ x/2 I{x>0}, xeR. 

Since two random variables of identical characteristic functions must be of equal 
law (Proposition 19.7.1 (ii)), we conclude: 

Note 19.8.1. The central x 2 distribution with two degrees of freedom x\ is the 
mean-2 exponential distribution. 

From (19.36) and the relationship between the moments of a distribution and the 
derivatives at zero of its characteristic function (19.27), one can verify that the 
y-th moment of a Xn RV is given by 

E[(x 2 n )"] =nx (n + 2) x--- x (n + 2(i/-l)), KN, (19.37) 

so the mean is n; the second moment is n(n + 2); and the variance is In. 

Since the sum of the squares of random variables must be nonnegative, the density 
of the Xn distribution is zero on the negative numbers. It is given by 

f x 2 (x) = .„ l - p; er x ' 2 x^/ 2 ^- 1 l{x > 0}, (19.38) 

J x n \ i 2™/ 2 r(n/2) i J> \ J 

where T(-) is the Gamma function, which is defined by 

/•oo 

r(0 = / e-^-Mt, C>0. (19.39) 

Jo 
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If the number of degrees of freedom is even, then the density has a particularly 
simple form: 

fxlS*)= 2k{k 1 _ 1)l e- X/2 * k - 1 n*>Vh ^N, (19.40) 

thus demonstrating again that when the number of degrees of freedom is two, the 
central \ 2 distribution is the mean-2 exponential distribution (Note 19.8.1). 

A related distribution is the generalized Rayleigh distribution, which is the 
distribution of the square root of a random variable having a Xn distribution. The 
density of the generalized Rayleigh distribution is given by 

/ —-(ar) = 2 t - x"' 1 e- x2 / 2 l{x > 0}, (19.41) 

WxV ; 2™/2r(n/2) i J' v > 

and its moments by 

r , 2 v l 2 T((n+v)/2) 
E[(^r] = fSj^M "€N. (19-42) 

The Rayleigh distribution is the distribution of the square root of a x| random 
variable, i.e., the distribution of the square root of a mean-2 exponential random 
variable. The density of the Rayleigh distribution is obtained by setting n = 2 in 
(19.41): 

f / -,(x)=xe-* 2 / 2 I{x>0}. (19.43) 



19.8.2 The Noncentral \- Distribution and Related Distributions 

Using (19.32) and the fact that the MGF of the sum of independent random vari- 
ables is the product of their MGFs, we obtain that if X\ , . . . , X n are independent 
with Xj ~ J\f(fj,j,a 2 ), then the MGF of £\ X 2 is given by 

e fer^ e 2^ (1 _ 2 „2 8)] # < — ^_ (19.44) 



Vl-2a 2 6»/ ' 2f7 2 ' 

Noting that this MGF depends on the individual means /Ui, . . . , /i„ only via the 
sum of their squares ^ /i 2 , we obtain: 

Note 19.8.2. The distribution of the sum of the squares of independent equivari- 
ance Gaussians is determined by their number, their common variance, and by the 
sum of the squares of their means. 

The distribution of the sum of n independent unit- variance Gaussians whose squared 
means sum to A is called the noncentral \ 2 distribution with n degrees of 
freedom and noncentrality parameter A. This distribution is denoted by \n a- 
Substituting a 2 = 1 in (19.44) we obtain that the MGF of the Xn \ distribution is 

E[e*<*] = ^- ? ===j e -f e3 (T^y, < _. (19.45) 
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A special case of this distribution is the central x 2 distribution, which corresponds 
to the case where the noncentrality parameter A is zero. 

Explicit expressions for the density of the noncentral % 2 distribution can be found 
in (Johnson, Kotz, and Balakrishnan, 1994b, Chapter 29, Equation (29.4)) and in 
(Simon, 2002, Chapter 2). An interesting representation of this density in terms 
of the density /-,2 of the central \ 2 distribution is: 

/**>) = E (%^ A/2 ) /^w. * eK - ( l9 - 46 ) 

j=0 v ■>' y 

It demonstrates that a Xn a ran dom variable X can be generated by picking a 
random integer j according to the Poisson distribution of parameter A/2 and by 
then generating a central \ 2 random variable of n + 2j degrees of freedom. That 
is, to generate a x 2 n \ random variable X, generate some random variable J taking 
value in the nonnegative integers according to the law 

Pr[J = i] = c-V2^i ) j = 0,1,... (19.47) 

and then generate X according the central \ 2 distribution with n + 2j degrees of 
freedom, where j is the outcome of J. 

The density of the x\ a distribution is 

f xlx (x) = ^e^ x +^/ 2 l a (Vx^)l{x>0} 7 (19.48) 

where Io(-) is the modified zeroth-order Bessel function, which is defined in (27.47) 
ahead. 

The generalized Rice distribution corresponds to the distribution of the square 
root of a noncentral \ 2 distribution with n degrees of freedom and noncentrality pa- 
rameter A. The case n = 2 is called the Rice distribution. The Rice distribution 
is thus the distribution of the square root of a random variable having the noncen- 
tral x 2 distribution with 2 degrees of freedom and noncentrality parameter A. The 
density of the Rice distribution is 

/ r ^(x) = xe-^ x2+x ^ 2 I Q (xV\)l{x>0}. (19.49) 

The following property of the noncentral % 2 is useful in detection theory. In the 
statistics literature this property is called the Monotone Likelihood Ratio prop- 
erty (Lehmann and Romano, 2005, Section 3.4). Alternatively, it is called the Total 
Positivity of Order 2 of the function (x, A) i— > f'2 (x). 

Proposition 19.8.3 (The Noncentral x 2 Family Has Monotone Likelihood Ratio). 

Let f x 2 (£) denote the density at £ of the noncentral x 2 distribution with n degrees 
of freedom and noncentrality parameter A > 0; see (19.46). Then for £1,^2 > 
and Ai, A 2 > we have 



, Co < f 1 and A < Ai 



) =* fo x (£0) f x ' A (£1) < f x ' A (£0) f x > x (£1)) , (19-50) 
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i.e., 



fxl.JO 



Ai > A ) => £ i— > - "' 1 , , is nondecreasing in £ > I . (19.51) 

V Jxlx (£) / 

A- n ,A () 

Proof. See, for example, (Finner and Roters, 1997, Proposition 3.8). D 

19.9 The Limit of Gaussians Is Gaussian 

There are a number of useful definitions of convergence for sequences of random 
variables. Here we briefly mention a few and show that, under each of these defi- 
nitions, the convergence of a sequence of Gaussian random variables to a random 
variable X implies that X is Gaussian. 

Let the random variables X, X\ , X 2 , ... be defined over a common probability space 
(CI, T , P). We say that the sequence X\, X2, ■ ■ ■ converges to X with probability 
one or almost surely if 



Pr 



(jw G Cl : lim X n (u>) = X(oj)\) = 1. (19.52) 



Thus, the sequence X\, X 2 , ■ ■ ■ converges to X almost surely if there exists an event 
Af £ T of probability zero such that for every u $ Af the sequence of real numbers 
Xi(u>),X2(u>), ■ ■ ■ converges to the real number X{ui). 

The sequence X\, X2, ■ ■ ■ converges to X in probability if 

lim Pi\\X n -X\ > el =0, £ > 0. (19.53) 

n — >oo 

The sequence X\, X2, ■ ■ ■ converges to X in mean square if 

lim E\(X n -X) 2 } =0. (19.54) 

We refer the reader to (Shiryaev, 1996, Ch. II, Section 10, Theorem 2) for a proof 
that convergence in mean-square implies convergence in probability and for a proof 
that almost-sure convergence implies convergence in probability. Also, if a sequence 
converges in probability to X, then it has a subsequence that converges to X with 
probability one (Shiryaev, 1996, Ch. II, Section 10, Theorem 5). 

Theorem 19.9.1. Let the random variables X, X\, X2, . . . be defined over a common 
probability space (57, J 7 , P). Assume that each of the random variables Xi,X 2 , ■ ■ ■ 
is Gaussian. If the sequence X\,X 2 , ■ ■ ■ converges to X in the sense of (19.52) or 
(19.53) or (19.54), then X must also be Gaussian. 

Proof. Since both mean-square convergence and almost-sure convergence imply 
convergence in probability, it suffices to prove the theorem in the case where the 
sequence X\ , X 2 , . . . converges to X in probability. And since every sequence con- 
verging to X in probability has a subsequence converging to X almost surely, it 



19.9 The Limit of Gaussians Is Gaussian 357 

suffices to prove the theorem for almost sure convergence. Our proof for this case 
follows (Shiryaev, 1996, Ch. II, Section 13, Paragraph 5). 

Since the random variables Xi, X 2 , ■ ■ ■ are all Gaussian, it follows from (19.29) that 

E[e iro *"] =e iro ^'-l ro2<T ", tpgl, (19.55) 

where fi n and a 2 are the mean and variance of X n . By the Dominated Convergence 
Theorem it follows that the almost sure convergence of X\ , X 2 , ... to X implies 
that 

lim E[e iroJ H = E[e iroX l , net. (19.56) 

n^oo 

It follows from (19.55) and (19.56) that 

lim e ™^-W°l = E[e iroX l , met. (19.57) 

n — >oo 

The limit in (19.57) can exist for every w € M only if there exist fi,o~ 2 such that 
fi n — » [i and c„ — > a 2 . And in this case, by (19.57), 

E[e iroX ] =£"*-W<*, met, 

so, by Proposition 19.7.1 (ii) and by (19.29), X is M{[i,a 2 ). D 

Another type of convergence is convergence in distribution or weak conver- 
gence, which is defined as follows. Let Fi,F 2 ,... denote the cumulative distri- 
bution functions of the sequence of random variables X\ , X 2 .... We say that the 
sequence F\, F 2 , . . . (or sometimes X\, X 2 , . . .) converges in distribution to the cu- 
mulative distribution function F(-) if F n (^) converges to F(£) at every point (eK 
at which F(-) is continuous. That is, 

(Fn(0 -» F(0) . ( F (-) is continuous at A . (19.58) 

Theorem 19.9.2. Let the sequence of random variables X\, X 2 , . . . be such that 
X n ~ J\f(n n , <J^\ , for every neN. Then the sequence converges in distribution to 
some limiting distribution if, and only if, there exist some \x and a 2 such that 

A*n ~~ * M an d °~ n ~ * a ■ (19.59) 

And if the sequence does converge in distribution, then it converges to the mean-ji 
variance-a 2 Gaussian distribution. 

Proof. See (Gikhman and Skorokhod, 1996, Chapter I, Section 3, Theorem 4) 
where this statement is proved in the multivariate case. □ 

For extensions of Theorems 19.9.1 & 19.9.2 to random vectors, see Theorems 23.9.1 
& 23.9.2 in Section 23.9. 
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19.10 Additional Reading 

The Gaussian distribution, its characteristic function, and its moment generating 
function appear in almost every basic book on Probability Theory. For more on 
the Q- function see (Verdii, 1998, Section 3.3) and (Simon, 2002). For more on 
distributions related to the Gaussian distribution see (Simon, 2002), (Johnson, 
Kotz, and Balakrishnan, 1994a), and (Johnson, Kotz, and Balakrishnan, 1994b). 
For more on the central \ 2 distribution see (Johnson, Kotz, and Balakrishnan, 
1994a, Chapter 18) and (Simon, 2002, Chapter 2). For more on the noncentral x 2 
distribution see (Johnson, Kotz, and Balakrishnan, 1994b, Chapter 29) and (Simon, 
2002, Chapter 2). Various characterizations of the Gaussian distribution can be 
found in (Bryc, 1995) and (Bogachev, 1998). 

19.11 Exercises 



Exercise 19.1 (Sums of Independent Gaussians). Let X\ ~ 7V(0, ctJ) and X 2 ~ V(0, a 
be independent. Convolve their densities to show that Xi + X2 is Gaussian. 



Exercise 19.2 (Computing Probabilities). Let X ~ Af(l,3) and Y ~ A/"(-2,4) be inde- 
pendent. Express the probabilities Pr[X < 2] and Pr[2X + 3F > —2] using the Q-function 
with nonnegative arguments. 

Exercise 19.3 (Bounds on the Q-function). Prove (19.19). We suggest changing the 
integration variable in (19.9) to £ = £ — a and then proving and using the inequality 

l-y<exp(-^-)<l, C6I- 



Exercise 19.4 (An Application of Craig's Formula). Let the random variables Z ~ Af(0, 1) 
and A be independent, where A 2 is of MGF M A 2(-). Show that 



1 



pi[z>\a\]=- r m a j-— vw 

n J V 2 sin ips 



Exercise 19.5 (An Expression for Q 2 (a)). In analogy to (19.15), derive the identity 



1 /•w/4 

-.2, •. '- 



Q (a) = - / e ssin^d^, a > 0. 







Exercise 19.6 (Expectation of Q(X)). Show that for any RV X 
E[Q(X)] = -^f P4X < £] e" ?2/2 dC 



(See (Verdii, 1998, Chapter 3, Section 3.3, Eq. (3.57)). 
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Exercise 19.7 (Generating Gaussians from Uniform RVs). 

(i) Let Wi and W 2 be IID A/"(0,1), and let R = ^W^ + W^. Show that R has 
a Rayleigh distribution, i.e., that its density /n(r) is given for every r £ R by 

re~^ l{ r > 0}. What is the CDF F R (-) of R? 

(ii) Prove that if a RV X is of density fx(-) and of CDF F x (-), then F X (X) ~ U (0, 1). 



(iii) Show that if U\ and U2 are IID W (0, 1) and if we define R — Win y- and = 2-KU2, 
then i? cos 9 and i? sin 6 are IID jV(0, 1/2) . 

Exercise 19.8 (Infinite Divisibility). Show that for any /i£l and ct 2 > there exist IID 
RVs X and Y such that X + Y ~ A/"(/*, a 2 ) . 

Exercise 19.9 (MGF of the Square of a Gaussian). Derive (19.32). 

Exercise 19.10 (The Distribution of the Magnitude). Show that if a random variable X 
is of density fx(-) and if Y — \X\, then the density /y(-) of Y is 

Ml/) = (/*(») + /x(-v))i{v>0}, yet. 

Exercise 19.11 (Uniformly Distributed Random Variables). Suppose that X ~W([0, 1]). 

(i) Find the characteristic function $x(-) of X. 
(ii) Show that if X and Y are independent with X as above, then X+Y is not Gaussian. 

Exercise 19.12 (Sums and Differences of IID RVs). Let X and Y be IID random variables 
with finite variances. Show that if X + Y and X — Y are independent, then X and Y are 
Gaussian. 

(See (Feller, 1971, Chapter III, Section 4).) 



Chapter 20 

Binary Hypothesis Testing 

20.1 Introduction 



In Digital Communications the task of the receiver is to observe the channel out- 
puts and to use these observations to accurately guess the data bits that were sent 
by the transmitter, i.e., the data bits that were fed to the modulator. Ideally, the 
guessing would be perfect, i.e., the receiver would make no errors. This, alas, is 
typically impossible because of the distortions and noise that the channel intro- 
duces. Indeed, while one can usually recover the data bits from the transmitted 
waveform (provided that the modulator is a one-to-one mapping), the receiver has 
no access to the transmitted waveform but only to the received waveform. And 
since the latter is typically a noisy version of the former, some errors are usually 
unavoidable. 

In this chapter we shall begin our study of how to guess intelligently, i.e., how, 
given the channel output, one should guess the data bits with as low a probability 
of error as possible. This study will help us not only in the design of receivers but 
also in the design of modulators that allow for reliable decoding from the channel's 
output. 

In the engineering literature the process of guessing the data bits based on the 
channel output is called "decoding." In the statistics literature this process is 
called "hypothesis testing." We like "guessing" because it demystifies the process. 

In most applications the channel output is a continuous-time waveform and we seek 
to decode a large number of bits. Nevertheless, for pedagogical reasons, we shall 
begin our study with the simpler case where we wish to decode only a single data 
bit. This corresponds in the statistics literature to "binary hypothesis testing," 
where the term "binary" reminds us that in this guessing problem there are only 
two alternatives. Moreover, we shall assume that the observation, rather than 
being a continuous-time waveform, is a vector or a scalar. In fact, we shall begin 
our study with the simplest case where there are no observations at all. 
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20.2 Problem Formulation 

In choosing a guessing strategy to minimize the probability of error, the labels 
of the two alternatives are immaterial. The principles that guide us in guessing 
the outcome of a fair coin toss (where the labels are "heads" or "tails") are the 
same as for guessing the value of a random variable that takes on the values +1 
and —1 equiprobably. (These are, of course, extremely simple cases that can be 
handled with common sense.) Statisticians typically denote the two alternatives 
by Ho and Tii and call them "hypotheses." We shall denote the two alternatives 
by and 1. We thus envision guessing the value of a random variable H taking 
value in the set {0, 1} with probabilities 

tt = Pt[H = 0], 7Ti =Pr[H = l]. (20.1) 

The prior is the distribution of H or the pair (ttq^tti). It reflects the state of our 
knowledge about H before having made any observations. We say that the prior 
is nondegenerate if 

71-0,71-! >0. (20.2) 

(If the prior is degenerate, then H is deterministic and we can determine its value 
without any observation. For example if ttq = we always guess 1 and never err.) 
The prior is uniform if ir = ix\ = 1/2. 

Aiding us in the guess work is the observation Y, which is a random vector taking 
value in R d . (When d = 1 the observation is a random variable and we denote it 
by Y .) We assume that Y is a column vector, so, using the notation of Section 17.2, 



(yw yM) T . 



Typically there is some statistical dependence between Y and H; otherwise, Y 
would be useless. If the dependence is so strong that from Y one can deduce H, 
then our guess work is very easy: we simply compute from Y the value of H and 
declare the result as our guess; we never err. The cases of most interest to us 
are therefore those where Y neither determines H nor is statistically independent 
of H. Unless otherwise specified, we shall assume that, conditional on H = 0, 
the observation Y is of density /y|.h"=o(') an d that, conditional on H = 1, it is of 
density /yi/j=i(')- Here /y|,h=o(') an d fy\H=i{') are nonnegative Borel measurable 
functions from M. d to R that integrate to one. 1 

Our problem is how to use the observation Y to intelligently guess the value of H. 
At first we shall limit ourselves to deterministic guessing rules. Later we shall 
show that no randomized guessing rule can outperform an optimal deterministic 
rule. A deterministic guessing rule (or decision rule , or decoding rule) for 
guessing H based on Y is a (Borel measurable) mapping from the set of possible 
observations M. d to the set {0, 1}. We denote such a mapping by 

</>Guc SS :R d ^{0,l} (20.3) 



1 Readers who are familiar with Measure Theory should note that these are densities with 
respect to the Lebesgue measure on IR d , but that the reference measure is inessential to our 
analysis. We could have also chosen as our reference measure the sum of the probability measures 
on K d corresponding to H = and to H = 1. This would have guaranteed the existence of the 
densities. 



362 Binary Hypothesis Testing 

and say that </>Gucss(yobs) is the guess we make after having observed that Y = y bs- 
The probability of error associated with the guessing rule 0Guess(/) is 

Pr(error) 4 Pr[^ Gucss (Y) ± H}. (20.4) 

Note that two sources of randomness determine whether the guessing rule <^>Guess(') 
errs or not: the realization of H and the generation of Y conditional on that 
realization. We say that a guessing rule is optimal if no other guessing rule 
attains a smaller probability of error. (We shall later see that there always exists 
an optimal guessing rule. 2 ) In general, there may be a number of different optimal 
guessing rules. We shall therefore try to refrain from speaking of the optimal 
guessing rule. We apologize if this results in cumbersome writing. The probability 
of error associated with optimal guessing rules is the optimal probability of 
error and is denoted throughout by 

p* (error). 

20.3 Guessing in the Absence of Observables 

We begin with the simplest case where there are no observables. Common sense 
dictates that in this case we should base our guess on the prior (7i"o,7Ti) as follows. 
If 7To > 7i"i , then we should guess that the value of H is 0; if 7To < 7ri, then we 
should guess the value 1; and if ttq = tt\ = 1/2, then it does not really matter what 
we guess: the probability of error will be either way 1/2. 

To verify that this intuition is correct note that, since there are no observables, 
there are only two guessing rules: the rule "guess 0" and the rule "guess 1." The 
former results in the probability of error -K\ (it is in error whenever H = 1 , which 
happens with probability 7Ti), and the latter results in the probability of error tto. 
Hence the former rule is optimal if -kq > 7Ti and the latter is optimal when 7Ti > 7To . 
When tt = tti both rules are optimal and we can use either one. 

We summarize that, in the absence of observations, an optimal guessing rule is: 

«_-{? »^.-^^-fl- ,20.5, 

I 1 otherwise. 

(Here we guess also when Pr[7J = 0] = Pr[H = 1]. An equally good rule would 
guess 1 in this case.) 

As we next show, the error probability p* (error) of this rule is 

p* (error) = min{Pr[iJ = 0],Pr[H =1]}. (20.6) 

This can be verified by considering the case where Pr[iJ = 0] > Pr[_ff = 1] and the 
case where Pr[i7 = 0] < Pr[H = 1] separately. By (20.5), in the former case our 



2 Thus, while there is no such thing as "smallest strictly positive number," i.e., a positive 
number that is smaller-or-equal to any other positive number, we shall see that there always 
exists a guessing rule that no other guessing rule can outperform. Mathematicians paraphrase 
this by saying that "the infimum of the probability of error over all the guessing rules is achievable, 
i.e., is a minimum." 
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guess is with the associated probability of error Pr[iJ = 1], whereas in the latter 
case our guess is 1 with the associated probability of error Pr[H = 0]. In either 
case the probability of error is given by the RHS of (20.6). 

20.4 The Joint Law of H and Y 

Before we can extend the results of Section 20.3 to the more interesting case where 
we guess H after observing Y, we pause to discuss the joint distribution of H 
and Y. This joint distribution is needed in order to derive an optimal decision rule 
and in order to analyze its performance. Some care must be exercised in describing 
this law because H is discrete (binary) and Y has a density. It is usually simplest 
to describe the joint law by describing the prior (the distribution of H), and by 
then describing the conditional law of Y given H = and the conditional law of Y 
given H = 1. 

If, conditional on H = 0, the distribution of Y has the density /yih=o(0 and if, 
conditional on H = 1, the distribution of Y has the density /y|h=i(")i then the 
joint distribution of H and Y can be described using the prior (ttq,tti) (20.1) and 
the conditional densities 

/y|h=o(-) and / Y |h=i(-)- (20.7) 

From the prior (7i"o,7Ti) and the conditional densities /y|//=o(')> /yij?=i(') we can 
compute the (unconditional) density of Y: 

My) = W Y |ff=o(y) + *-i/ Y |ff=i(y), y e Rd - ( 20 - 8 ) 

The conditional distribution of H given Y = y bs is a bit more tricky because 
the probability of Y taking on the value y bs (exactly) is zero. There are two 
approaches to defining Pr[i7 = 0|Y = y bs] in this case: the heuristic one that is 
usually used in a first course on probability theory and the measure-theoretic one 
that was pioneered by Kolmogorov. Our approach is to define this quantity in a 
way that will be palatable to both mathematicians and engineers and to then give 
a heuristic justification for our definition. 

We define the conditional probability that H = given Y = y bs as 

( ^[)/Y|H = o(yoba) -f f / \ ^ r, 

/v(y obs ) /Yiyobs) > u, (2Q9a) 

I otherwise, 

where /y( - ) is given in (20.8), and analogously 

/v(y obs ) /Yiyobsj > u, (20 _ 9b) 

2 otherwise. 

Notice that our definition is meaningful in the sense that the values we assign to 
Yv[H = | Y = y bs] and Yv[H = 1 1 Y = y bs] are nonnegative and sum to one: 

Pr[ J ff = 0|Y = y obs ]+Pr[# = l|Y = y obs ] = 1, y obs el d . (20.10) 
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Also note that our definition of Pr[H = 0|Y = y bs] and Pr[H = 1|Y = y bs] 
for those y b s G K d for which fy{y bs) = is quite arbitrary; we chose 1/2 just 
for concreteness. 3 Indeed, it is not difficult to verify that the probability that y b s 
satisfies 7To/Y|ff=o(yobs) + 7r i/Y|_ff=i(yobs) = is zero, and hence our definitions in 
this eventuality are not important; see (20.12) ahead. 

If d = 1, then the observation is a random variable Y and a heuristic way to 
motivate (20.9a) is to consider the limit 

^H = o,y*(vo* -*,***+*)]. (20 . n) 

5io p r [Y e(y ohs -S,y ohs + 5)} 

Assuming some regularity of the conditional densities (e.g., continuity) we can use 
the approximations 

/•£/obs+<5 

Pt[H = 0,Y G (j/obs- S,y ohs + 6)} = tt / fy\H=o{y)&y 

~ 2TT Sf Y \H=o{yobs), $ < i, 
ryobs + S 

Pr [Y G (y Q b s - <5, y b s + S)]= fy(y)dy 

J Hobs- S 

~25f Y (y ohs ) 7 <5«1, 

to argue that, under suitable regularity conditions, (20.11) agrees with the RHS of 
(20.9a) when /y(j/obs) > 0. A similar calculation can be carried out in the vector 
case where d > 1. 

We next remark on observations y b s at which the density of Y is zero. Accounting 
for such observations makes the writing a bit cumbersome as in (20.9). Fortunately, 
the probability of such observations is zero: 

Note 20.4.1. Let H be drawn according to the prior (7i"o,7Ti), and let the con- 
ditional densities of Y given H be /yij?=o(") and /v|fl'=i(") with /y( - ) given in 
(20.8). Then 

Pr[Ye{yGl d :/ Y (y) = 0}] = 0. (20.12) 



Pr[Y G {y G R d : / Y (y) = 0}] = / /v(y)dy 

J{yeR d :/ Y (y)=o} 



Proof. 



Ody 

{yeR d :/ Y (y)=o} 

= 0, 

where the second equality follows because the integrand is zero over the range of 
integration. □ 



3 In the measure-theoretic probability literature our definition is just a "version" (among many 
others) of the conditional probabilities of the event H = (respectively H = 1), conditional on 
the cr-algebra generated by the random vector Y (Billingsley, 1995, Section 33), (Williams, 1991, 
Chapter 9). 
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We conclude this section with two technical remarks which are trivial if you ignore 
observations where /y(') is zero: 

Note 20.4.2. Consider the setup of Note 20.4.1. 
(i) For every y e M. d 

mm{ir f YlH=0 (y),ir 1 f Y \H=i{y)} 

= min{Pr[tf = 0|Y = y],Pr[H = 1|Y = y]}/ Y (y). (20.13) 

(ii) For every y G M. d 

7To/Y|H=o(y) > 7i"i/Y|H=i(y) 

O (Pr\H = 0|Y = y] > Pr[H = l|Y = y]). (20.14) 



Proof. Identity (20.13) can be proved using (20.9) and (20.8) by separately con- 
sidering the case fy(y) > and the case /V(y) = (where the latter is equivalent, 
by (20.8), to 7r / Y |H=o(y) and 7r 1 / Y | H=1 (y) both being zero). 

To prove (20.14) we also separately consider the case /v(y) > and the case 
/v(y) = 0. In the former case we note that for c > the condition a > b is 
equivalent to the condition a/c > b/c so for /v(yobs) > 

*ofY\H=a(y) > 7n/ Y |ff=i(y)J # { /y(y) > /y(y) )■ 

\ ^ • s s/ • 

Pr[H=0\y=y] Pr[i?=l|Y=y] 

In the latter case where /v(y) = we note that, by (20.8), both 7i"o/Y|H=o(y) 
and 7Ti/Y|ji"=i(y) are zero, so the condition on the LHS of (20.14) is true (0 > 0). 
Fortunately, when fy(y) = the condition on the RHS of (20.14) is also true, 
because in this case (20.9) implies that Pr[H = 0|Y = y] and Pr[H = 1 1 Y = y] 
are both equal to 1/2 (and 1/2 > 1/2). □ 



20.5 Guessing after Observing Y 

We next derive an optimal rule for guessing H after observing that Y = y b s - 
We begin with a heuristic argument. Having observed that Y = y bs, there are 
only two possible decision rules: to guess or guess 1. Which should we choose? 
The answer now depends on the a posteriori distribution of H . Once it has been 
revealed to us that Y = y bs, our outlook changes and we now assign the event 
H = the a posteriori probability Pr[iJ = 0|Y = y bs] and the event H = 1 the 
complementary probability Pr[H = 1 1 Y = y bs]- If the former is greater than the 
latter, then we should guess 0, and otherwise we should guess 1. Thus, after it has 
been revealed to us that Y = y b s the situation is equivalent to one in which we 
need to guess H without any observables and where our distribution on H is not 
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its a priori distribution (prior) but its a posteriori distribution. Using our analysis 
from Section 20.3 we conclude that the guessing rule 

,* , , /0 ifPr[ff=0|Y = y obs ]>Pr[ff=l|Y = y obB ], 

^Gucss(yobs) =< . (20.15) 

I 1 otherwise, 

is optimal. Once again, the way we resolve ties is arbitrary: if the observation 
Y = y bs results in the a posteriori distribution of H being uniform, that is, if 
Pr[_ff = 0| Y = y bs] = ~Pt[H = 1 1 Y = y bs] = 1/2, then either guess is optimal. 
Using Note 20.4.2 (ii) we can also express the decision rule (20.15) as 

,* / v JO if 7To/Y|ff=o(yobs) > 7Tl/Y|H=l(yobs), ,„„,., 

'/'Guess ^obs) = < 1 A , (20.16) 

I 1 otherwise. 

Conditional on Y = y bs, the probability of error of the optimal decision rule is, 
in analogy to (20.6), given by 

p*(error|Y = y obs ) = min{Pr[tf = 0|Y = y ohs ],Pr[H = 1|Y = y obs ]}, (20.17) 

as can be seen by treating the case Pr [i? = 1 Y = y bs] > Pr[_ff = 1 1 Y = y bs] and 
the complementary case Pr[_ff = 0|Y = y bs] < Pr[iJ = 1 1 Y = y bs] separately. 

The unconditional probability of error associated with the rule (20.15) is thus 
p* (error) = E [min{Pr[if = 0|Y],Pr[#= 1|Y]}] (20.18) 

= / min{Pr[H = 0|Y = y],Pr[ff = l|Y = y]}/y(y)dy (20.19) 

= / min{7r / Y |H=o(y),7i"i/Y|H=i(y)}dy, (20.20) 

Jm d 

where the last equality follows from Note 20.4.2 (i). 

Before summarizing these conclusions in a theorem, we present the following simple 
lemma on the probabilities of error associated with general decision rules. 

Lemma 20.5.1. Consider the setup of Note 20.4-1- Let </>Guess(") be an arbi- 
trary guessing rule as in (20.3). Then the probabilities of error p(error\H = 0), 
p(error|_ff = 1), and p(error) associated with (j>Guess{') are given by 

p(evTov\H = 0) = f / Y |ff=o(y)dy, (20.21) 

Jy<£V 

p(error\H = 1) = f / Y | ff=1 (y)dy, (20.22) 

Jyev 

and 

p(error)= f (n fr\H=o(y) I{y i V} + 7Ti/ Y |H=i(y) I{y € V}) dy, (20.23) 

Jm d v ' 

where 

V = {y e R d : Guoss (y) = 0}. (20.24) 
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Proof. Conditional on H = the guessing rule makes an error only if Y does not 
fall in the set of observations for which 0Guess(O produces the guess U H = 0." This 
establishes (20.21). A similar argument proves (20.22). Finally, (20.23) follows 
from (20.21) & (20.22) using the identity 

p(error) = 7i"op(error|iJ = 0) + 7Tip(error|77 = 1). □ 

We next state the key result about binary hypothesis testing. The statement is a 
bit cumbersome because, in general, there may be many observations that result 
in H being a posteriori uniformly distributed, and an optimal decision rule can 
map each such observation to a different guess and still be optimal. 

Theorem 20.5.2 (Optimal Binary Hypothesis Testing). Suppose that a guessing 
rule 4>Q uess '- K d — > {0, 1} produces the guess "H = 0" only when y b s is such that 
nofY\H=o(yobs) > 7r i/Y|H=i(y bs), i-e., 

(^Gucsslyobs) = 0j => (7To/ Y |ff=o(yobs) > 7Tl/Y|ff=l(yobs)J, (20.25a) 

and produces the guess "H = 1" only when 7Ti/ Y |H=i(yobs) > 7i"o/Y|ff=o(yob s ), 
i.e., 

(0Guc SS (yobs) = lj => (7Tl/Y|H=l(yobs) > 7I"o/Y|H=o(yobs) J • (20.25b) 

Then no other guessing rule has a smaller probability of error, and 

Pr[0 G ucss(Y)^ff] = I min{ 7 ro/ Y | ff= o(y),7r 1 / Y |H=i(y)}dy. (20.26) 

Proof. Let ^Gucss : K d — > {0, 1} be any guessing rule, and let 

V = {y e R d : G ucs S (y) = 0} (20.27) 

be the set of observations that result in </>Guess(") producing the guess U H = 0." 
Then the probability of error associated with cj)Guess(') can be lower-bounded by 

Pr[0 Gucss (Y) ?h]= [ (7T / Y |H=o(y) i{y i v} + 7n/ Y |ff=i(y) i{y e £>}) d y 

> / min{7r / Y |H=o(y),7ri/Y|ff=i(y)}dy, (20.28) 

Jm d 

where the equality follows from Lemma 20.5.1 and where the inequality follows 
because for every value of y £ M. d 

^o/ Y |H=o(y)i{y i V] + 7Ti/ Y | H =i(y)i{y g v] 

> min{7r / Y | H=0 (y),^ 1 / Y | ff=1 (y)}, (20.29) 

as can be verified by noting that, irrespective of the set T>, one of the two terms 
I{y G T>} and I{y ^ D} is equal to one and the other is equal to zero, so the LHS of 
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(20.29) is either equal to 7To/ Y |ff=o(y) or to 7Ti/ Y |ij = i(y) and hence lower-bounded 
by min{7r / Y |ff=o(y),7i"i/Y|H=i(y)}- 

We prove the optimality of 0q uoss (-) by next showing that the probability of error 
associated with 0g ucss (-) is equal to the RHS of (20.28). To this end we define 

V* = {y £ R d : <^ uoss (y) = 0} (20.30) 

and note that if both (20.25a) and (20.25b) hold, then 

^o/ Y |H=o(y)i{y i v*} + 7ri/ Y |ff=i(y)i{y e v*} 

= min{^ / Y | ff=0 (y),7r 1 / Y | H=1 (y)}, y £ R d . (20.31) 
Applying Lemma 20.5.1 to the decoder 0q uoss (-) we obtain 

Pr[<^ uess (Y)^#] = ( (7r / Y |H=o(y)I{y^2?*} + 7ri/ Y |H=i(y)I{yeP*})dy 

jR d V 7 

min{7r / Y |ff = o(y),7ri/ Y |H=i(y)}dy, (20.32) 



where the second equality follows from (20.31). The theorem now follows from 
(20.28) and (20.32). □ 

Referring to a situation where the observation results in the a posteriori distribu- 
tion of H being uniform as a tie we have: 

Note 20.5.3. The fact that both conditional on H = and conditional on H = 1 
the observation Y has a density does not imply that the probability of a tie is zero. 

For example, if H takes value in {0, 1} equiprobably, and if the observation Y is 
given by Y = H + U, where U is uniformly distributed over the interval [—2,2] 
independently of H, then the a posteriori distribution of H is uniform whenever 
Y £ [—1,2], and this occurs with probability 3/4. 



20.6 Randomized Decision Rules 

So far we have restricted ourselves to deterministic decision rules, where the guess 
is a deterministic function of the observation. We next remove this restriction and 
allow for some randomization in the decision rule. As we shall see in this section 
and in greater generality in Section 20.11, when properly defined, randomization 
does not help: the lowest probability of error that is achievable with randomized 
decision rules can also be achieved with deterministic decision rules. 

By a randomized decision rule we mean that, after observing that Y = y bs, the 
guesser chooses some bias fe(y bs) € [0, 1] and then tosses a coin of that bias. 
If the result is "heads" it guesses and otherwise it guesses 1. Note that the 
deterministic rules we have considered before are special cases of the randomized 
ones: any deterministic decision rule can be viewed as a randomized decision rule 
where, depending on y bs, the bias 6(y Q bs) is either zero or one. 
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Figure 20.1: A block diagram of a randomized decision rule. 



Some care must be exercised in defining the joint distribution of the coin toss with 
the other variables (H, Y). We do not want to allow for "telepathic coins." That is, 
we want to make sure that once Y = y bs has been observed and the bias 6(y bs) 
has been accordingly computed, the outcome of the coin toss is random, i.e., has 
nothing to do with H. Probabilists would say that we require that, conditional on 

Y = y bsj the outcome of the coin toss be independent of H. (We shall discuss 
conditional independence in Section 20.11.) We can clarify the setting as follows. 
Upon observing the outcome Y = y bs, the guesser computes the bias 6(y bs)- 
Using a local random number generator the guesser then draws a random variable 
uniformly over the interval [0, 1] , independently of the pair (H, Y) . If the outcome 6 
is smaller than &(y bs), then it guesses "H = 0," and otherwise it guesses U H = 1." 
A randomized decision rule is depicted in Figure 20.1. 

We offer two proofs that randomized decision rules cannot outperform the best 
deterministic ones. The first is by straightforward calculation. Conditional on 

Y = y bs, the randomized guesser makes an error either if < 6(y bs) (resulting 
in the guess U H = 0") while H = 1, or if > 6(y b s ) (resulting in the guess 
U H = 1") while H = 0. Consequently, 



Pr (error | Y = y obs ) 
= b(y ohs )Pr[H = I 



Y = y obs ] + (1 - KYobs)) Pr [H = | Y = y obs ] . (20.33) 



Thus, Pr(error|Y = y bs) is a weighted average of Pr[H = 0|Y = y bs] and 
Pr[iJ = 1|Y = y bs]- As such, irrespective of the weights, it cannot be smaller 
than the minimum of the two. But, by (20.17), the optimal deterministic decision 
rule (20.15) achieves just this minimum. We conclude that, irrespective of the bias, 
for each outcome Y = y b s the conditional probability of error of the randomized 
decoder is lower-bounded by that of the optimal deterministic decoder (20.15). 
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Since this is the case for every outcome, it must also be the case when we average 
over the outcomes. This concludes the first proof. 

In the second proof we view the outcome of the local random number generator 
as an additional observation. Since it is independent of (H,Y) and since it is 
uniform over [0, 1], 

/Y,e|ff=o(y>#) = /Y|J7=o(y)/e|Y=y,J7=o(0) 

= /Y|ff=o(y)/e(0) 

= / Y |ff=o(y)I{O<0<l}, (20.34a) 

and similarly 

/Y,e|H=i(y,0) = /Y|if=i(y)i{o<0<i}. (20.34b) 

Since the randomized decision rule can be viewed as a deterministic decision 
rule that is based on the pair (Y,0), it cannot outperform any optimal de- 
terministic guessing rule based on (Y,0). But by Theorem 20.5.2 and (20.34) 
it follows that the deterministic decision rule that guesses U H = 0" whenever 
7r o/Y|H=o(y) ^ 7r ify\H=i{y) is optimal not only for guessing H based on Y but 
also for guessing H based on (Y, 0), because it produces the guess U H = 0" only 
when 7r / Y ,e|ff=o(y,#) > ^ifir,e\H=i(y,0) and it produces the guess "H = 1" 
only when 7Ti/Y,e|H=i(y) ^) — 7r of~Y,e\H=o(y, &)■ This concludes the second proof. 
Even though randomized decision rules cannot outperform the best deterministic 
rules, they may have other advantages. For example, they allow for more symmetric 
ways of resolving ties. Suppose, for example, that we have no observations and that 
the prior is uniform. In this case guessing U H = 0" will give rise to a probability of 
error of 1/2, with an error occurring whenever H = 1. Similarly guessing "H = 1" 
will also result in a probability of error of 1/2, this time with an error occurring 
whenever H = 0. If we think about H as being an information bit, then the former 
rule makes sending less error prone than sending 1. A randomized test that flips 
a fair coin and guesses if "heads" and 1 if "tails" gives rise to the same average 
probability of error (i.e., 1/2) and makes sending and sending 1 equally (highly) 
error prone. 

If Y = y bs results in a tie, i.e., if it yields a uniform a posteriori distribution 
on H, 

Pr[H = | Y = y obs ] = Pv[H = 1 | Y = y obs ] = l -, 

then the probability of error of the randomized decoder (20.33) does not depend on 
the bias. In this case there is thus no loss in optimality in choosing 6(y bs) = 1/2, 
i.e., by employing a fair coin. This makes for a symmetric way of resolving the tie 
in the a posteriori distribution of H. 

20.7 The MAP Decision Rule 

In Section 20.5 we presented an optimal decision rule (20.15). A slight variation 
on that decoder is the Maximum A Posteriori (MAP) decision rule. The MAP 
rule is identical to (20.15) except in how it resolves ties. Unlike (20.15), which 
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resolves ties by guessing "H = 0," the MAP rule resolves ties by nipping a fair 
coin. It can thus be summarized as follows: 

'0 if Pr[JJ = 0| Y = y obs ] > Pr[H = 1 1 Y = y obs ], 

^MAp(yobs)= < 1 ifPr[ J ff = 0|Y = y obs ] <Pr[ff=l|Y = y obs ], 

U({0, 1}) if Pr[H = 0| Y = y obs ] = Pr[H = 1 1 Y = y obs ], 

(20.35) 
where we use "W({0, 1})" to indicate that we guess the outcome uniformly at 
random. 

Note that, like the rule in (20.15), the MAP rule is optimal. This follows because 
the way ties are resolved does not influence the probability of error, and because 
the MAP rule agrees with the rule (20.15) for all observations which do not result 
in a tie. 

Theorem 20.7.1 (The MAP Rule Is Optimal). The Maximum A Posteriori deci- 
sion rule (20.35) is optimal. 

Since the MAP decoder is optimal, 

p*(error) = 7t pmap (error \H = 0) + 7Tip M AP (error \H = 1), (20.36) 

where pMAp(error|_ff = 0) and Pmap (error \H = 1) denote the conditional prob- 
abilities of error for the MAP decoder. Note that one can easily find guessing 
rules (such as the rule "always guess 0") that yield a conditional probability of 
error smaller than Pmap (error | H = 0), but one cannot find a rule whose average 
probability of error outperforms the RHS of (20.36). 

Using Note 20.4.2 (ii) we can express the MAP rule in terms of the densities and 
the prior as 



0MAp(y o bs) 



if 7To/Y|_ff=o(yobs) > 7Tl/Y|H=l(yobs), 

1 if 7r / Y |H=o(yob S ) < 7Tl/Y|i?=l(yobs), (20.37) 

u({o, i}) if 7r / Y |H=o(yob S ) = ^ifY\H=i{y bs)- 



Alternatively, the MAP decision rule can be described using the likelihood-ratio 
function LR(-), which is defined by 

LR(y) = fY\ H = y> | ye M.<i (20.38) 

/Y|i?=i(y) 



using the convention 



- = oo, a > o) and - = 1. (20.39) 



Since densities are nonnegative, and since we are defining the likelihood-ratio func- 
tion using the convention (20.39), the range of LR(-) is the set [0, oo] consisting of 
the nonnegative reals and the special symbol oo: 

LR: R d -» [0,oo]. 
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Using the likelihood-ratio function and (20.37), we can rewrite the MAP rule for 
the case where the prior is nondegenerate (20.2) and where the observation y b s is 
such that /y (y bs) > as 



</ , MAp(yobs 



ifLR(y obs )>^ 

if LR(y obs ) < ^, (tto.tti > 0, / Y (y bs) > 0). 



ILL 

to ' 



,W({0,1}) ifLR(y obs ) 

(20.40) 

Since many of the densities that are of interest to us have an exponential form, it 
is sometimes more convenient to describe the MAP rule using the log likelihood- 
ratio function LLR: R d — » [—00,00], which is defined by 

/v|H=o(y) 



fY\H=i(y)' 



-00, a > 



ye 



and In - = 0, 




(20.41) 



(20.42) 



LLR(y) = In 

using the convention 

(, a , ° 

in — = +00, in — 

V a 

where ln(-) denotes natural logarithm. 

Using the log likelihood-ratio function LLR(-) and the monotonicity of the loga- 
rithmic function 

(a > b) O (in a > In b) , a, b > 0, (20.43) 

we can express the MAP rule (20.40) as 



0MAp(y o bs) = < 



ifLLR(y obs )>lna, 

1 if LLR(y obs ) < lnfi, (tto.tti > 0, / Y (y obs ) > 0). 
[U({0,1}) ifLLR(y obs )=lna, 

(20.44) 



20.8 The ML Decision Rule 



A different decision rule, which is typically suboptimal unless H is a priori uniform, 
is the Maximum-Likelihood (ML) decision rule. Its structure is similar to that 
of the MAP rule except that it ignores the prior. In fact, if ttq = tti, then the two 
rules are identical. The ML rule is thus given by 

if /y|#=0 (yobs) > Iy\H=1 (yobs), 

0ML(yobs) = I 1 if JY|ff=o(yobs) < /Y|H=l(yobs), (20.45) 

u({o, 1}) if JY|ff=o(yobs) = /Y|ff=i(y b S )- 

The ML decision rule can be alternatively described using the likelihood-ratio func- 
tion LR(-) (20.38) as 



0ML(yobs) = < 



if LR(y obs ) > 1, 

1 if LR(y obs ) < 1, 
U({0,1}) if LR(y obs ) = 1. 



(20.46) 



20.9 Performance Analysis: the Bhattacharyya Bound 
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Alternatively, using the log likelihood-ratio function LLR(-) (20.41): 

'0 if LLR(y obs ) > 0, 

<?!>ML(yobs) = I 1 if LLR(y obs ) < 0, 

U({0,1}) if LLR(y obs ) = 0. 



(20.47) 



20.9 Performance Analysis: the Bhattacharyya Bound 

We next derive the Bhattacharyya Bound, which is a useful upper bound on 
the optimal probability of error p* (error). 

Starting with the exact expression (20.20) we obtain: 

p* (error) = / min {7ro/ Y |tf=o(y), 7i"i/Y|H=i(y)} dy 

JR d 

< / \A"o/Y|H=o(y)7Tl/Y|H=l(y)dy 

JR d V 

= V^otti / v/Y|H=o(y)/Y|H=i(y)dy 

JR d V 

< 77 / \//Y|H=o(y)/Y|H=l(y)dy, 

z Jm d v 

where the equality in the first line follows from (20.20); the inequality in the second 
line from the inequality 



i{a,b} < vab, a, b > 0, 



(20.48) 



(which can be easily verified by treating the case a > b and the case a < b sepa- 
rately); the equality in the third line by trivial algebra; and where the inequality 
in the fourth line follows by noting that if c, d > 0, then their geometric mean v cd 
cannot exceed their arithmetic mean (c+ d)/2, i.e., 



c + d 
cd < , c, a > 0, 



and because in our case c = ir and d = 7i"i, SO C + d = 1. 
We have thus established the bound 



(20.49) 



p* (error) < 



1 



yeR d 



/Y|ff=o(y)/Yiff=i(y)dy 



(20.50) 



which is known as the Bhattacharyya Bound. 



20.10 Example 

Consider the problem of guessing H based on the observation Y , where H takes 
on the values and 1 equiprobably and where the conditional densities of Y given 



374 Binary Hypothesis Testing 

H = and H = 1 are 

/ y | ff=0 (y) = -=L= e-^- A > 2 /^ 2 \ yeR, (20.51a) 

V2tt<t z 

f YlH=1 (y) = —L=e-(y+ A ) 2 /^ 2 \ yeR (20.51b) 

V27reH 

for some deterministic A, a > 0. Here the observable is a RV, so <i = 1. 

For these conditional densities the likelihood-ratio function (20.38) is given by: 

LR(») - fYlH = 0iV) 



Iy\h= 


i(v) 




i 


e -(y-A) 


2 /(^ 2 ) 


1 

V2TTCT 2 


e -(v+ A ) 


2 /(2^ 2 ) 


e 4yA/( 


^\ y 


el 



Since the two hypotheses are a priori equally likely, the MAP rule is equivalent to 
the ML rule and both rules guess "H = 0" or "H = 1" depending on whether the 
likelihood-ratio LR(y b s ) is greater or smaller than one. And since 

LR(y obs ) > 1 «. e 4 ^/(2^) > 1 

^ In (e 4 ^ A /( 2CT2 )) >lnl 
O 4y ohs A/{2a 2 ) > 

«=> yobs > o, 

and 

LR(y obs ) < 1 O e 4yo ba A/( 2CT 2 ) < j 

4* In (>° b sA/(2 CT ^ <lnl 

«• 4y obs A/(2(T 2 ) < 
<£> 2/obs < 0, 

it follows that the MAP decision rule guesses "H = 0," if y bs > 0; guesses "H = 1," 
if j/obs < 0; and guesses "H = 0" or "H = 1" equiprobably if y b s = (i.e., in the 
case of a tie). 

Note that in this example the probability of a tie is zero. Indeed, under both 
hypotheses, the probability that the observed variable Y is exactly equal to zero is 
zero: 

Pr[F = | H = 0] = Pr[F = | H = l] = Pv[Y = 0] = 0. (20.52) 

Consequently, the way ties are resolved is immaterial. 

We next compute the probability of error of the MAP decoder. To this end, let 
PMAp(error|i7 = 0) and pMAp(error|_ff = 1) denote its conditional probabilities of 
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error. Its (unconditional) probability of error, which is also the optimal probability 
of error, can be expressed as 

p*(error) = 7r pMAp(error|iJ = 0) + m pMAp(error|iJ = 1). (20.53) 

We proceed to compute the required terms on the RHS. Starting with the term 
Pmap (error |_ff = 0), we note that pMAp(error|_ff = 0) corresponds to the condi- 
tional probability that Y is negative or that Y is equal to zero and the coin toss 
that the MAP decoder uses to resolve the tie causes the guess to be "H = 1." By 
(20.52), the conditional probability of a tie is zero, so jjmap (error |i? = 0) is, in 
fact, just the conditional probability that Y is negative: 



p M Ap(error|i7 = 0) = Pr[Y < 0\H = 0] 



<,)■ 



(20.54) 



where the second equality follows because, conditional on H = 0, the random 
variable Y is A/" (A, a 2 ) , and the probability that it is smaller than zero can be thus 
computed using the Q-function as in (19.12b). Similarly, 

PMAp(error|i7 = 1) = Pr[F > 0\H = 1] 

= Q(-). (20.55) 

Note that in this example the MAP rule is "fair" in the sense that the conditional 
probability of error given H = is the same as given H = 1. This is a coincidence 
(that results from the symmetry in the problem). In general, the MAP rule need 
not be fair. 

We conclude from (20.53), (20.54), and (20.55) that 

/ A\ 
p* (error) = Q(— )• (20.56) 

Figure 20.2 depicts the conditional densities of y given H = and given H = 1 
and the decision regions of the MAP decision rule </>map( - )- The area of the shaded 
region is the probability of an error conditioned on H = 0. 

Note that the optimal decision rule for this example is not unique. Another optimal 
decision rule is to guess U H = 0" if y bs is positive but not equal to 17, and to 
guess "H = 1" otherwise. 

Even though we have an exact expression for the probability of error (20.56) it is 
instructive to compute the Bhattacharyya Bound too: 



p*(error) < - y/fY\H=o{y)fY\H=i{y)dy 



. e -fe-A)V(2<r 2 ) _^ = e -(j/+A)V(2o- 2 ) dy 



1 

2 7-oc Y V2ir<r 2 ' V2wa 2 

i e -AV2^ r i e -//^ 2 dy 

l -e- A2 ' 2 °\ (20.57) 
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Guess U H = V 



Guess "H = 0" 



H=o(y) 




PMAp(error|if = 0) 



Figure 20.2: Binary hypothesis testing with a uniform prior. Conditional on H = 
the observable Y is A/" (A, a 2 ) and conditional on H = 1 it is M{— A, ct 2 ). The area 
of the shaded region is the probability of error of the MAP rule conditional on 
H = 0. 



where the first line follows from (20.50); the second from (20.51); the third by simple 
algebra; and the final equality because the Gaussian density (like all densities) 
integrates to one. 

As an aside, we have from (20.57) and (20.56) the bound 



Q(a) < \ e 



-a 2 /2 



a>0, 



(20.58) 



which we encountered in Proposition 19.4.2. 



20.11 (Nontelepathic) Processing 



To further emphasize the optimality of the Maximum A Posteriori decision rule, 
and for ulterior motives that have to do with the introduction of conditional inde- 
pendence, we shall next show that no processing of the observables can reduce the 
probability of a guessing error. To that end we shall have to properly define what 
we mean by "processing." 

The first thing that comes to mind is to consider processing as the application of 
some deterministic mapping. I.e., we think of mapping the observation y b s using 
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Yobs 


SO 


,9(yobs) 


Guess H based 
on 9 (yobs) 


Guess 









Figure 20.3: No decision rule based on g(y bs) can outperform an optimal decision 
rule based on y bs, because computing g(y bs) and then forming the decision based 
on the answer can be viewed as a special case of guessing based on yobs- 



some deterministic function g{) to <?(y bs) and then guessing H based on g(y bs)- 
That this cannot reduce the probability of error is clear from Figure 20.3, which 
demonstrates that mapping y b s to g{y hs) and then guessing H based on g(y bs) 
can be viewed as a special case of guessing H based on y b s and, as such, cannot 
outperform the MAP decision rule, which is optimal among all decision rules based 
on y obs . 

A more general kind of processing involves randomization, or "dithering." Here we 
envision the processor as using a local random number generator to generate a ran- 
dom variable and then producing an output of the form g(y bs, $obs), where # bs 
is the outcome of 0, and where g(-) is some deterministic function. Here is 
assumed to be independent of the pair (i7, Y), so the processor can generate it 
using a local random number generator. 

An argument very similar to the one we used in Section 20.6 (in the second proof of 
the claim that randomized decision rules cannot outperform optimal deterministic 
rules) can be used to show that this type of processing cannot improve our guessing. 
The argument is as follows. We view the application of the function g(-) to the 
pair (Y, 0) as deterministic processing of the pair (Y, 0), so no decision rule based 
on g(Y,0) can outperform a decision rule that is optimal for guessing H based 
on (Y,0). It thus remains to show that the decision rule 'Guess U H = 0" if 
7To/Y|H=o(yobs) > 7ri/ Y |j?=i(yobs)' is also optimal when observing (Y,0) and not 
only Y. This follows from Theorem 20.5.2 by noting that the independence of 
and (H, Y), implies that 



7obsJ 
9obs) 

'H -- 



/Y,6|ff=o(yobs 
/Y,0|ff=l(yobs 

and hence that this rule guesses 

""o/Y,e|ff=o(yobs>#obs) > 7Tl/Y,e|H=l(yobs,^ob: 
7r l/Y,e|H=l(yobs,^obs) > 7To/Y,e|ff=o(yobs,^obs)- 

Fearless readers who are not afraid to divide by zero should note that 

/Y,e|ff=o(yobs5 #obs) 



- /Y|H=o(yobs) fe(Oobs), 

= /Y|ij=i(yobs) /ety'obs), 

0" only when y obs and ^s 
and guesses U H = 



are such that 
1" only when 



LR(y bs,#obs 



/• 



Y,e|-Ff=i 



(yobs,#obs) 
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_ /Y|H=o(yobs) /e(^obs) 

/Y|H=l(yobs) /e(^obs) 

/Y|H=o(yobs) , ,„ v , 

= 7 ~, ri Jeifobs) T U 

/Y|_f/=l(,yobsj 

= LR(y obs ), /e(^obs) / 0, 

so (ignoring some technical issues) the MAP detector based on (y b SJ $obs) ig- 
nores # bs and is identical to the MAP detector based on y bs only. 

Ostensibly more general is processing Y by mapping it to g(Y,0), where the 
distribution of is allowed to depend on y bs- This motivates us to further extend 
the notion of processing. The cleanest way to define processing is to define its 
outcome rather than the way it is generated. 

Before defining processing we remind the reader of the notion of conditional inde- 
pendence. But first we recall the definition of (unconditional) independence. We 
do so for discrete random variables using their Probability Mass Function (PMF). 
The extension to random variables with a joint density is straightforward. For the 
definition of independence in more general scenarios see, for example, (Billingsley, 
1995, Section 20) or (Loeve, 1963, Section 15) or (Williams, 1991, Chapter 4). 

Definition 20.11.1 (Independent Discrete Random Variables). We say that the 
discrete random variables X and Y of joint PMF Px.yi'i ') an d marginal PMFs 
Px(') 0" n d Py(-) are independent if Px.y{', ■) factors as 

Px.Y{x,y) = Px{x)P Y {y)- (20.59) 

Equivalently, X and Y are independent if, for every outcome y such that Py(y) > 0, 
the conditional distribution of X given Y = y is the same as its unconditional 
distribution: 

Px\y(x\y) = Px(x), Py(y) > 0. (20.60) 

Equivalently, X and Y are independent if, for every outcome x such that Px {x) > 0, 
the conditional distribution of Y given X = x is the same as its unconditional 
distribution: 

P Y \x(y\x) = Py(y), Px(x)>0. (20.61) 

The equivalence of (20.59) and (20.60) follows because, by the definition of the 
conditional probability mass function, 



Px.y(x,y ) 

My) 

Similarly, the equivalence of (20.59) and (20.61) follows from 

Px,r(x,y) 
Px(x) 



p x \y(x\v)= -YC - . My)>o. 



Py\x(y\x)= Y)i; ■ ^x(z)>0. 



The beauty of (20.59) is that it is symmetric in X, Y. It makes it clear that X 
and Y are independent if, and only if, Y and X are independent. This is not 
obvious from (20.60) or (20.61). 



4 Technical issues arise when the outcome of 0, namely 6> b s , is such that /e(#obs) = 0. 
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The definition of the conditional independence of X and Y given Z is similar, except 
that we condition everywhere on Z . Again we only consider the discrete case and 
refer the reader to (Loeve, 1963, Section 25.3) and (Chung, 2001, Section 9.2) for 
the general case. 

Definition 20.11.2 (Conditionally Independent Discrete Random Variables). Let 

the discrete random variables X, Y, Z have a joint PMF Px,y,z{-, ■, ■)• We sa V that 
X and Y are conditionally independent given Z and write 

X^-Z^-Y 

if 

Px,Y\z(x,y\z) = P xlz (x\z)P Ylz (y\z), P z (z)>0. (20.62) 

Equivalently, X and Y are conditionally independent given Z if, for any outcome 
y,z with PY,z{y,z) > 0, the conditional distribution of X given that Y = y and 
Z = z is the same as the distribution of X when conditioned on Z = z only: 

Px\y,z{*\V, *) = Px\z{x\z), Py,z{y, z) > 0. (20.63) 

Or, equivalently, X and Y are conditionally independent given Z if 

Py\x,z{v\x, z) = P Y \z(y\z), Px,z{x, z) > 0. (20.64) 

The equivalence of (20.62) and (20.63) follows because, by the definition of the 
conditional probability mass function, 

o f \ \ Px,Y,z{x,y,z) 

Px\yM*\V,*)= PYz{mz) 

__ Px,Y\z{x,y\z)Pz{z) 
P Y]z (y\z)P z (z) 
Px,Y\z{x,y\z) 

= p / I x , PY,z(y,z)>o, 
p Y\z{y\z) 

and similarly the equivalence of (20.62) and (20.64) follows from 

p /in p x,Y\z{x,y\z) 

PY\x,z(y\x, z) = — Px.z{x 7 z) > 0. 

Px\z\x\z) 

Again, the beauty of (20.62) is that it is symmetric in X, Y. Thus X^—Z^—Y if, 
and only if, Y^>— Z^>— X . When X and Y are conditionally independent given Z 
we sometimes say that X^—Z^—Y forms a Markov chain. 

The equivalence between the different definitions of conditional independence con- 
tinues to hold in the general case where the random variables are not necessarily 
discrete. We only reluctantly state this as a theorem, because we never defined 
conditional independence in nondiscrete settings. 

Theorem 20.11.3 (Equivalent Definition for Conditional Independence). Let X, 

Y, and Z be random vectors. Then the following statements are equivalent: 
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(a) X and Y are conditionally independent given Z. 

(b) The conditional distribution of Y given (X, Z) is equal to its conditional 
distribution given Z. 

(c) The conditional distribution of X given (Z,Y) is equal to its conditional 
distribution given Z. 

Proof. For a precise definition of concepts appearing in this theorem and for a 
proof of the equivalence between the statements see (Loeve, 1963, Section 25.3) 
and particularly Theorem 25. 3A therein. □ 

We are now ready to define the processing of the observation Y with respect to 
the hypothesis H. 

Definition 20.11.4 (Processing). We say that Z is the result of processing Y 
with respect to H if H and Z are conditionally independent given Y. 

As we next show, this definition of processing extends the previous ones. We 
first show that if Z = g(Y) for some deterministic Borel measurable function g(-) 
then H^>— Y^— g(Y). This follows by noting that, conditional on Y, the random 
variable g(Y) is deterministic and hence independent of everything and a fortiori 
of H. 

We next show that if is independent of (H,Y), then H^>— Y^— <?(Y, 0). In- 
deed, if Z = g(Y,Q) with being independent of (Y,_ff), then, conditionally on 
Y = y, the distribution of Z is simply the distribution of g(y,0) so (under this 
conditioning) Z is independent of H. 

We next show that processing the observables cannot help decrease the probability 
of error. The proof is conceptually very simple; the neat part is in the definition. 

Theorem 20.11.5 (Processing Is Futile). If Z is the result of processing Y with 
respect to H , then no rule for guessing H based on Z can outperform an optimal 
guessing rule based on Y. 

Proof. Surely no decision rule that guesses H based on Z can outperform an 
optimal decision rule based on Z, let alone outperform a decision rule that is 
optimal for guessing H based on Z and Y. But an optimal decision rule based on 
the pair (Z, Y) is the MAP rule, which compares 



Pr[iJ = 0|Y = y,Z = z] and Pr[ff = 1 1 Y = y, Z 



z 



And, because H^>— Y^— Z, it follows from Theorem 20.11.3 that this is equivalent 
to comparing 

Pr[iJ = 0|Y = y] and Pr[# = 1 1 Y = y] 

i.e., to an optimal (MAP) decision rule based on Y only. □ 

The above theorem is more powerful than it seems. To demonstrate its strength, 
we next use it to show that in testing for a signal in Gaussian noise — irrespective of 
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the prior — the optimal probability of error is monotonically nondecreasing in the 
noise variance. The setup we consider is one where H is of prior (ttq, 7Ti) and aiding 
us in guessing H is the observable Y, which, conditional on H = m, is A/"(o: m ,cr 2 ) 
for m £ {0, 1}. We shall argue that, irrespective of the prior (ttq,tti), the optimal 
probability of error is monotonically nondecreasing in a 2 . 

The beauty of the argument is that it allows us to prove the monotonicity result 
without having to calculate the optimal probability of error explicitly (as we did 
in Section 20.10 for the case of a uniform prior with ag = A and a\ = —A). While 
we could also compute the optimal probability of error for this more general setup 
and then use calculus to derive the monotonicity result, the argument we present 
instead has the advantage of also being applicable to multi-dimensional multi- 
hypothesis testing scenarios, where there is typically no closed-form expression for 
the optimal probability of error. 

To prove this result, let p*(cr 2 ) denote the optimal probability of error as a function 
of a 2 . We need to show that p* e {cr 2 ) < p* e {a 2 + S 2 ), for all S £ R. Consider the 
low-noise case where the conditional law of Y given H is N(a m , a 2 ) . Suppose that 
the receiver generates W ~ 7V(0,<5 2 ) independently of (H,Y) and adds W to Y 
to form Z = Y + W . Since Z is the result of processing Y with respect to H, it 
follows that the optimal probability of error based on Y, namely p*(<7 2 ), is at least 
as good as the optimal probability of error based on Z (Theorem 20.11.5). We 
now complete the argument by showing that the optimal probability of error based 
on Z is pl((T 2 + 5 2 ). This follows because, by Proposition 19.7.2, the conditional 
law of Z given H is Af(a m ,a 2 + S 2 ). 

Stated differently, since using a local random number generator the receiver can 
produce from an observation Y of conditional law N(a mi fi 2 ) a random variable Z 
whose conditional law is Af(a m ,<7 2 + S 2 ), the minimal probability of error based 
on an observation having conditional law Af{a m ,<r 2 ) cannot be larger than the 
optimal probability of error achievable based on an observation having conditional 
law Af(a m , a 2 + S 2 ). See Figure 20.4 for an illustration of this argument. 

20.12 Sufficient Statistics 

This section affords a first glance at the notion of sufficient statistics, which will be 
studied in greater depth and generality in Chapter 22. We begin with the following 
example. Consider the hypothesis testing problem with a uniform prior, where the 
observation is a tuple of real numbers (Yi, Y 2 ). Conditional on H = 0, the random 
variables Yi,Y 2 are IID 7V(0,<7q), whereas conditional on H = 1 they are IID 
A/"(0,er 2 ), where 

cr > cf\ > 0. (20.65) 

(If Oq = a 2 , then the problem is boring in that the conditional law of the observable 
given H = is the same as given H = 1, so the two hypotheses cannot be differ- 
entiated. For a 2 ^ a 2 there is no loss in generality in assuming <7o > &i because 
we can always relabel the hypotheses. And if CTo > a i = 0j then the problem is 
trivial: we guess "H = 1" only if Y\ = Y% = 0.) Thus, the observation space is the 
two-dimensional Euclidean space R 2 and, using the explicit form of the Gaussian 
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Since we assumed a uniform prior, the ML decoding rule for guessing H based on 
the tuple (Yi,Y2J is optimal. To derive the ML rule explicitly, we compute the 
likelihood-ratio function 



LR(yi,2/2) 



fy u Y 2 |H=o(2/i,2/2) 



M 


,Y" 2 |tf= 


1(2/ 


l,2/: 







1 

2tto 


■2 exp 




(- 


1 
2^ 


(2/? + 2/ 2 2 )) 




1 

2tto 


■2 exp 

1 


(- 


1 

2<rJ 


(2/? + 2/1)) 






exp f 


*( 


1 


(J / 


2/1 



2/1,2/2 e 



(20.67) 



Thus, 



LR(2/i,2/2) > 1 Oexp 



<£> 



i-i)(y? + y 2 2 )>in^ 

°i °o ' °i 
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2 



r ± { yl + yt) > ln < ^ 



^^ + 2/2> 2 ° V n-f, (20.68) 

(Tq (Tj (Tj 

where the second equivalence follows from the monotonicity of the logarithm func- 
tion (20.43); and where the last equivalence follows by multiplying both sides of 
the inequality by the constant 2er§cr 2 /(o' 2 — a\) (without the need to change the 
inequality direction because this constant is by (20.65) positive). 

It follows from (20.68) that the ML decision rule for guessing H based on (Yi, Y 2 ) 
computes Yf + Y 2 2 anc * then compares the result to a threshold. It is interesting to 
note that to implement this decision rule one need not observe Y± and Y 2 directly; 
it suffices to observe the sum of their squares 

T = Y? + Y 2 2 . (20.69) 

Of course, being the result of processing (Yi, Y%) with respect to H , no guess of H 
based on T can outperform an optimal guess based on (Yi,Y 2 ) (Section 20.11). 
But what is interesting about this example is that, even though one cannot recover 
(Yi,Y2) from T (so there are some decision rules based on {Yi^Yz) that cannot 
be implemented if one only knows T), the ML rule based on (Yj, Y 2 ) only requires 
knowledge of T. Thus, in this example, even though pre-processing the observations 
to produce T = Yj 2 + Y 2 2 is not reversible, basing one's decision on T incurs no loss 
in optimality An optimal decision rule based on T is just as good as an optimal 
rule based on (Yi, Y 2 ). 

The reason for this can be traced to the fact that, in this example, to compute the 
likelihood-ratio LR(yi,y 2 ) one need not know the pair (2/1,2/2); it suffices that one 
know the sum of their squares y\ + j/ 2 ; see (20.67). In this sense T = Y, 2 + Y 2 2 
forms a sufficient statistic for guessing H from (Yl, Y 2 ), as we next define. 

We would like to define a mapping T(-) from the observation space M. d to M. d as 
being sufficient for the densities /y|.h=o(') an< i /y|.h=i(') if the likelihood-ratio 
LR(y b s ) can be computed from T(y b s ) for every y b s hi WL d . However, for techni- 
cal reasons, we require slightly less: we only require that LR(y b s ) be computable 
from T(y b s ) for those observations y bs for which at least one of the densities is 
positive (so the likelihood-ratio is not of the form 0/0) and that additionally lie 
outside some prespecified set 3^o C K d of Lebesgue measure zero. 5 Thus, we shall 
require that there exist a set 3^0 C K d of Lebesgue measure zero and a function 
£: M. d — > [0,oo] such that C(^(yobs)) is equal to LR(y b s ) whenever 

yobs i ^0 and / Y |H=o(yobs) + /Y|H=i(yob s ) > 0. (20.70) 

Note that the fact that 3^o is of Lebesgue measure zero implies that 

Pr[Y e ^o I H = 0] = Pr[Y e y 1 H = 1] = 0. (20.71) 



6 We allow this exception set so that the question of whether T(-) forms a sufficient statistic 
or not will not depend on our choice of the density function of the conditional distribution of the 
observable. (Recall that if a RV has a probability density function, then it has infinitely many 
different probability density functions, every two of which differ on a set of Lebesgue measure 
zero.) 
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To convince the reader that this really is only "slightly" less, we note: 

Note 20.12.1. Both conditional on H = and conditional on H = 1, the proba- 
bility that the observable violates (20.70) is zero. 

Proof. We shall show that conditional on H = 0, the probability that the ob- 
servable violates (20.70) is zero. The conditional probability given H = 1 can be 
analogously shown to be zero. The condition that (20.70) is violated is equivalent 
to the condition that either y obs G y or / Y |ff=o(yobs) + /Y|i?=i(yobs) = 0. By 
(20.71), Pr[Y G y | H = 0] = 0. And, by the nonnegativity of the densities, 

Pr[/ Y | H =o(Y) + / Y |h=i(Y) = | H = 0] < Pr[/ Y | H=0 (Y) =0\H = 0] 

= / /Y|H=o(y)dy 

J{yeK d :/ Y |H = o(y)=0} 

Ody 



'{y6K d :/ Y |H=o(y)=o} 
= 0. 

Conditionally on H = 0, the probability of the observable violating (20.70) is thus 
the probability of the union of two events, each of which is of zero probability, and 
is thus of zero probability; see Corollary 21.5.2 ahead. □ 



Definition 20.12.2 (Sufficient Statistic for Two Densities). We say that a map- 
ping T: R d — ► R d forms a sufficient statistic for the density functions /y|j?=o(0 
and /Y|if=i(') on ^- d tf it is Borel measurable® and if there exists a set y$ C R d of 
Lebesgue measure zero and a Borel measurable function C '■ R - * [0, oo] such that 
for all y obs G R d satisfying (20.70) 



/Y|H=o(yobs 



C(T(y obs )), (20.72) 



/Y|ff=l(yobs) 

where on the LHS of (20.72) we define a/0 to be +oo whenever a > 0. 

In our example the observation (Y\,Yq) takes value in M 2 so d = 2; the mapping 
T: (j/i, yi) i— > y\ + y\ is a mapping from R 2 to R so d! = 1; and by, (20.67), 



°7 
C : t h-> — 7) exp 



1/1 1 



CT o V2Vct 2 a* 



6 The technical condition that T(-) is Borel measurable guarantees that T(Y) is a random 
vector. See for example (Billingsley, 1995, Theorem 13.1(h)) for a discussion of this technical 
issue. The issue is best seen in the scalar case. Suppose that Y is a RV defined over the 
probability space (Q, J 7 , P). If T(-) is any function, then T(Y) is a mapping from Q to the K, but 
we are not guaranteed that it be a RV, because for T(Y) to be a RV we must have that, for every 
£ £ R, the set {ui G Q : T(Y(u>)) < 5} be in T, and this is, in general, not true. However, if T(-) 
is Borel measurable, then the above cited theorem guarantees that T(X) is, indeed, a RV. Note 
that any continuous function is Borel measurable (Billingsley, 1995, Theorem 13.2). In practice, 
one never encounters functions that are not Borel measurable; In fact, it is hard work to construct 
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Here we can take 3^o to be the empty set. 7 

We next show that if T(-) is a sufficient statistic, then there is no loss in opti- 
mality in considering decision rules that base their decision on T(Y). This result 
is almost obvious, because the MAP decision rule is optimal (Theorem 20.7.1); 
because it can be expressed in terms of the likelihood-ratio function (20.40); and 
because the sufficiency of T(-) implies that the likelihood-ratio function LR(y b s ) 
is computable from X^Vobs)- Nevertheless, we provide a formal proof because the 
result is important. 

Proposition 20.12.3. If T: W 1 — » M. d is a sufficient statistic for the densities 
/y|h=o(0 an d /y|j?=i(')> then, irrespective of the prior of H , there exists a decision 
rule that guesses H based on T{Y) and which is as good as any optimal guessing 
rule based on Y. 

Proof. We need to show that if 4>q uoss {-) is an optimal decision rule for guessing H 
based on Y, then there exists a guessing rule based on T(Y) that has the same 
probability of error. We note that it is enough to prove this result for a nondegen- 
erate prior (20.2), because for degenerate priors one can achieve zero probability 
of error even without looking at T(Y): if Pr[i7 = 0] = 1 guess "H = 0," and if 
Pt[H = 1] = 1 guess U H = 1." We thus proceed to assume a nondegenerate prior 
(20.2). 

Let 0map(O be the MAP rule for guessing H based on Y. Since this rule is optimal, 
it suffices to exhibit a decoding rule 4>t{') based on T(Y) of equal performance. 
Since T(-) is sufficient, it follows that there exists a set of Lebesgue measure zero 3^o 
and a Borel measurable function £(•) such that C(^(yobs))= LR(y b s ), whenever 
(20.70) holds. Based upon the observation T(Y) = T(y b s ), the desired rule is to 
guess 

'0 if C ( T (y obs )) > a 

1 if C(T(y obs )) < ^ (20.73) 

[W({0,1» ifC(T(y obs )) 

That 4>t{-) has the same performance as </>map( - ) now follows by noting that, 
by (20.72), the two decoding rules are in agreement except perhaps for observa- 
tions y bs violating (20.70), but those, by Note 20.12.1, occur with probability zero. 
The performance of 0map( - ) (which is optimal based on Y) and of 4>t(') (which is 
based on T(Y)) are thus identical. □ 

Definition 20.12.2 is intuitive in that it demonstrates how one typically goes about 
identifying a sufficient statistic: one computes the likelihood-ratio and checks what 
it depends on. This definition, however, becomes a bit cumbersome in multi- 
hypothesis testing, which we shall discuss in Chapter 21. A definition that is more 
appropriate for that setting is given in Chapter 22 in terms of the computability 
of the a posteriori probabilities from T(y b s ) (Definition 22.2.1). The purpose of 
the next proposition is to show that the two definitions coincide in the binary case: 
ignoring sets of Lebesgue measure zero, the likelihood-ratio can be computed from 



0T(T(y obs )) = < 



TO ' 

ILL 



7 We would have needed to choose a nontrivial set y$ if we had changed the densities (20.66) 
at a finite number of points. 
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T(y b s ) (whenever the ratio is not 0/0), if, and only if, for any prior (wo, 7Ti) one can 
compute the a posteriori distribution of H from T(y b s ) (whenever iV(yobs) > 0). 

We draw the reader's attention to the following subtle issue. Definition 20.12.2 
makes it clear that the sufficiency of T(-) has nothing to do with the prior; it only 
depends on the densities /y|.h=o(') an d /y|h=i(')- The equivalent definition of 
sufficient statistics in terms of the computability of the a posteriori distribution 
ostensibly depends also on the prior, because it is only meaningful to discuss the a 
posteriori distribution if H has a prior. Nevertheless, the definitions are equivalent 
because in the latter definition we require that the a posteriori distribution be 
computable from T(Y) for every prior, and not just for the prior given in the 
problem's formulation. 

Proposition 20.12.4 (Computability of the a Posteriori Distribution). Let the 

mapping T: W 1 — > M. d be Borel measurable, and let /yih=o(') an d f~Y\H=i(') be 
densities on M. d . Then the following two conditions are equivalent: 

(a) T(-) forms a sufficient statistic for the densities /y|j?=o(") an d /y|jt=i(') - 

(b) For some set 3^o C K d of Lebesgue measure zero we have that for every prior 
(no, 7Ti) there exist Borel measurable functions from R d to [0, 1] 

t i-» Vm(7T0,7I"l,t), in = 0,1, 

such that the vector 

ipo (tto, tti , T{y ohs )) , Vi (tto, 7Ti, T(y obs )) 
is a probability vector, and this probability vector is equal to the vector 

Pr[ff = 0|Y = y obs ], Pr[i7=l|Y = y obs ]) T , (20.74) 

whenever both the condition y obs ^ 3^0; an d the condition 

7To/Y|ff=o(yobs) + 7Tl/Y|H=l(yobs) > (20.75) 

are satisfied. Here (20.74) is computed for H having the prior (7To,7Ti) and 
for the conditional densities /yih=o(') an d /y|h'=i(")- 

Proof. We begin by proving that (a) implies (b). That is, we assume that T(-) 
forms a sufficient statistic and proceed to prove the existence of the set 3^o an d 
of the functions if)o(-),ipi(-). Let 3^o and £: WL d — » [0,oo] be as guaranteed by the 
definition of sufficient statistics (Definition 20.12.2) so 

/Y|fl=o(yobs) / , ,n , , 

7 -, r = ({T{y b s )), (20.76) 

/Y|H=liyobsj 

whenever y obs satisfies (20.70). We next show how to construct for every pair 
(7To,7Ti) the functions tpo(-) , tpi(-) . We consider three cases separately: the case 
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7To = 1 — 7Ti = 1, the case 7To = 1 — iri = 0, and the case where both 7To and n± are 
strictly positive. 

In the first case H is deterministically zero, and the functions ^o(l)O)t) = 1 and 
t/>i(l,0,t) = meet our requirements. In the second case H is deterministically 
one, and the functions ?/>o(0, l,t) = 1 — i/>i(0, l,t) = meet our requirements. 

It remains to treat the case where tto,tti > 0. We shall show that in this case the 
functions 

^o(7ro,7Ti,t) = — -£— , ^i(7r ,7Ti,t) = 1 - Vo(i"o,7Ti,t), (20.77) 

(where 00/(00 + a) is defined as one for all finite a) meet our requirements. To 
that end we first note that V'o( 7r O: 7r ij t) and ^1 (^o j ^l j t) are nonnegative and sum 
to one. We next note that, for 7To,7Ti > 0, the condition (20.75) implies that 
/Y|ff=o( v obs) and /Y|#=i(yobs) are not both zero. Consequently, if y obs satisfies 
(20.75) and also y b s 4- 3^o, then it satisfies (20.70) and LR(y b s ) = C(^Xyobs))- 
Thus, in the case 7To,7Ti > 0, we have that, whenever (20.75) and y b s £ 3^o hold, 

, / rp, ^ 7roC(^(yobs)) 

7I"oC(r(y b s )J + 7Ti 

7T LR(y obs ) 
i"o LR(y obs ) + 7Ti 

7r o/Y|_ff=o(yobs)//Y|_H'=l(yobs) 
7To/Y|H=o(yobs)//Y|H=l(yobs) + TTl 
_ 7r o/Y|H=o(yobs) 

7To/Y|ff=o(yobs) + 7Tl/Y|H=l(yobs) 

= Pr[H = 0|Y = y obs ] 

as required. This implies that, whenever (20.75) and y bs ^ 3^o hold, we also have 
V>i(7r ,7ri,T(y obs )) = Pt[H = 1|Y = y obs ], since ipi(w ,iri,i) = 1 - ip (^o,^i,t) 
and since Pr[H = 1|Y = y obs ] = 1 - Pr[H = 0|Y = y ob J; see (20.10). 

We now prove that (b) implies (a), i.e., that the existence of the set y$ an d of 
the functions V'o('); V'i(') imply the existence of the function £(•). In fact, we shall 
prove a stronger statement that if for some nondegenerate prior the a posteriori 
distribution of H given Y = y obs is computable from T(y obs ) (whenever (20.75) 
and y obs ^ J^o hold), then there exists some function £: M. d — » [0,oo] such that 
LR(y obs ) = ((T(y b s )), whenever y obs satisfies (20.70). 

To construct £(•) from ^o(') and V'i(')i pick some arbitrary strictly positive 7To,7Ti 
summing to one (e.g., ffo, tti = 1/2), and define £(•) by 

C(T(y obs) ) = ^p^l^l (20.78) 

7I"0^l(7I"0,7I"l,J (yobs)J 

using the convention that a/0 = 00 for all a > 0; see (20.39). 

We next verify that if y obs satisfies (20.70) then C(^(yobs)) = LR(yobs)- To 
this end, define H to have the law Pr[i7 = 0] = ttq and Pr[H = 1] = ffi, 
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and let the conditional law of Y given H be as specified by the given densities. 
Since ttq and tt\ are strictly positive, it follows that whenever /Y|ff=o( v obs) and 
/Y|H=i(yobs) are not both zero, we also have 7fo/Y|ff=o(yobs)+7i"i/Y|ff=i(yobs) > 0. 
Consequently, for strictly positive tt , tx\ we have that (20.70) implies that y b s ^ 3^o 
and 7To/Y|H=o(yob s )+7ri/Y|H=i(yob s ) > and thus, for observations y obs satisfying 
(20.70), 

Hml a ^^(^^^^(yobs)) 

C(J (yob S )j = , /_ — ^^=r, ry 

7roVi(7ro,7ri,T(y obs )) 
= Pr[g=l]Pr[g=0|Y = y obs ] 

Pr[i7 = 0]Pr[iJ=l|Y = y obs ] 
= LR(y obs ), 

where the last equality follows by dividing the equation 

Pr[H = 0]/ Y |H=o(yobs) 



Pr[ J ff = 0|Y = y obs 



Pr{H = 0]/ Y |ff=o(yobs) + Pr[# = l]/ Y |H=i(yobs 
(which is a restatement of (20.9a) for our case) by 

Pr[ J ff=l]/ Y | ff= i(yobs) 



Pr[ J ff=l|Y = y obs 



Pr[H = 0]/ Y |ff=o(yob s ) + Pr[H = l]/ Y |H=i(yobs) 
(which is a restatement of (20.9b) for our case). □ 

Once we have identified a sufficient statistic T(Y), we can proceed to derive an 
optimal guessing rule using two methods that we describe next. Again, we focus 
on nondegenerate priors. 



Method 1: We ignore the fact that T(Y) forms a sufficient statistic and simply 
use the MAP rule (20.40): 



0MAp(y o bs) = < 



if LR(y obs ) > 

1 if LR(y obs ) < a (20.79) 



ZLL 

7T ' 



W({0,1}) ifLR(y obs 



ILL 
7T0 ' 



(Because T(Y) is a sufficient statistic, the likelihood-ratio function LR(y obs ) will 
be computable from T(y obs ) whenever LR(y obs ) does not have the pathological 
form 0/0 and does not lie in the exception set 3V Such pathological observations 
occur with probability zero (20.12), so we need not worry about them.) 



Method 2: By Proposition 20.12.3, there is no loss in optimality in forming our 
guess based on T(Y). So we can use any optimal rule, e.g., the MAP rule, for 
guessing H based on the new (i'-dimensional observations t obs = r(y obs ). This 
method requires computing the conditional distribution of the random dl- vector 



20.13 Consequences of Optimality 389 

T = T(Y) conditional on H = and conditional on H = 1 and deciding according 
to the rule: 

0Gucss(T(y obs )) = (° if ^T| ff= o(T(y obs )) > *i / T |H=i(T(y obs )), (2Q g()) 

[1 if 7T /t|JJ=0 (r(yobs)J < 7Ti f T \H=l {T{yobs)) , 

with ties being resolved at random. 

Why would we want to use Method 2 when we have already computed the likelihood- 
ratio function to establish the sufficiency of the statistic? The answer is that some- 
times one can demonstrate that T(Y) forms a sufficient statistic by methods that 
are not based on the computation of the likelihood-ratio. In such cases, Method 2 
may be advantageous. Also, sometimes the analysis of the probability of error in 
Method 2 is easier. The choice is ours. 

Returning to the example of (20.66), we demonstrate Method 2 by calculating 
the law of the sufficient statistic T = Y± + Y 2 2 under each of the hypotheses. 
Recalling that the sum of the squares of two IID zero-mean Gaussians is exponential 
(Note 19.8.1) we obtain: 

f T \H=o(t) = A ex P (- A) . t > 0. (20.81a) 

/T|ff=i(*) = s-2exp(-A). *>"■ (20.81b) 

Consequently, the likelihood-ratio is given by 



h\H=o{t) _ a\ ( ( 1 



jT\H=l(t) (T a V V2cr l 2cr 7 

and the log likelihood-ratio by 

, fT\H=o{t) a\ ( 1 1 , _ 

lT\H=l( 



f T \H=i(t) ~ ""<?o V 2a2 2a 2 



We thus guess "H = 0" if the log likelihood-ratio is positive, 

t>^\lni, 

i.e., if 

^ _ CT 2 <7 2 

We similarly guess "i? = 1" if the log likelihood-ratio is negative, and flip a coin if 
it is zero. This is the same law we obtained in (20.68) based on Method 1. 

20.13 Consequences of Optimality 

Consider the problem of guessing an a priori uniformly distributed binary ran- 
dom variable H based on the observable Y whose conditional law given H = 
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is A/"(0, a 2 ) and whose conditional distribution given H = 1 is Af(l,a 2 ). To de- 
rive an optimal guessing rule we could derive the MAP rule by computing the 
likelihood-ratio function as we did in Section 20.10. But having already carried 
out the calculations in Section 20.10 for testing whether an observation was drawn 
7V(A, a 2 ) or A/"(-A,cr 2 ), there is a better way. Let 

T = Y--. (20.82) 

Because there is a one-to-one relationship between Y and T, there is no loss in 
optimality in subtracting 1/2 from Y to obtain T and in then applying an optimal 
decision rule to T. Indeed, since Y = T + 1/2, it follows that Y is the result of 
processing T with respect to H , so no decision rule based on Y can outperform an 
optimal decision rule based on T (Theorem 20.11.5). (Of course, no decision rule 
based on T can outperform an optimal one based on Y, because T is the result of 
processing Y with respect to H.) In fact, using the terminology of Section 20.12, 
T: y i— > y — 1/2 forms a sufficient statistic for guessing H based on Y, because the 
likelihood-ratio function LR(y obs ) = fY\H=o{yobs)/ fY\H=i(y bs) can be expressed 
as £(T(y bs)) for the mapping £: t >— » LR(i + 1/2). Consequently, our assertion 
that there is no loss in optimality in forming our guess based on T{Y) is just a 
consequence of Proposition 20.12.3. 

Conditional on H = 0, the random variable T(Y) is A/"(— 0.5, c 2 ) , and, conditional 
on H = 1, it is A/"(+0.5, a 2 ). Consequently, using the results of Section 20.10 (with 
the substitution of 1/2 for A), we obtain that an optimal rule based on T is to guess 
"H = 0" if T is negative, and to guess "H = 1" if T is positive. To summarize, the 
decision rule we derived is to guess "H = 0" if Y — 1/2 < and to guess "H = 1" 
if Y- 1/2 > 0. 

In the terminology of Section 20.12, we used the fact that the transformation in 
(20.82) is one-to-one to conclude that T(-) forms a sufficient statistic, and we then 
used Method 2 from that section to derive an optimal decision rule. 



20.14 Multi-Dimensional Binary Gaussian Hypothesis Testing 

We now come closer to the receiver front end. The kind of problem we would 
eventually like to address is the hypothesis testing problem in which, conditional 
on H = 0, the observable is a continuous-time waveform of the form So(t) + N(t) 
whereas, conditional on H = 1, it is of the form S\(t) + N(t), where (-ZV(i), t G M) 
is some continuous-time stochastic process modeling the noise. This problem will 
be addressed in Chapter 26. For now we only address the discrete time version of 
this problem. 



20.14.1 The Setup 

We consider the problem of guessing the random variable H that takes on the 
values and 1 with positive probabilities 7To and tt\. The observable Y 6 l' is 
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a random vector with ] components Y^ l \ . . . , Y"^-*. 8 Conditional on H = 0, the 
components of Y are independent Gaussians with Y^> ~ N(sq , <7 2 ), where s is 
some deterministic vector of J components s Q , . . . , s , and where a 2 > 0. Con- 
ditional on H = 1, the components of Y are independent with Y^' ~ Af{sf, fi 2 ), 

for some other deterministic vector Si of J components s\ ,... ,s\ . We assume 
that So and Si differ in at least one coordinate. The setup can be described as 

H = Q:Y^=sf + Z^\ j = 1,2,..., J, 

H=1:Y U) = S U) + Z U) J j = 1,2,..., J, 

where Z^ , Z^ , . . . , Z^ are IID 7V(0, cr 2 ) . 

For typographical reasons, instead of denoting the observed vector by y bs, we now 
denote it by y and its J components by j/' 1 ', . . . , y">. 

20.14.2 An Optimal Decision Rule 

To find an optimal guessing rule we compute the likelihood-ratio function: 
/v|if=o(y) 



LR(y) 



/Y|H=i(y) 

n j / i f G/ 3) -4 J) ) : 

n 1 f 1 expT (""'-'"0 1 

= JUL ^P (^ ^ + ^ ) j • y e 

The log likelihood-ratio function is thus given by 
LLR(y) = lnLR(y) 

= ^ I < y ' S ° " Sl >E + g 

1 /, , (s ,s - Si) E + (si,s - Si 



>J 



a 



I So - Si|| / / Sp-Sj 

\ \ ' ll s o - si| 



2 














(so, 


s — si 

s — si 


)e + ' 


(si, 


So- 

s - 


-Sl 

-si|| 


)e 



l|SO g2 Sl|l ((y,0) E -^((so,0) E +(si,0) E )), yeK 1 , (20.83) 



'We use J rather than d in order to comply with the notation of Section 21.6 ahead. 
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/ guess 1 




guess / 

/ guess 1 





71~0 < TTl 



7T — 7Ti 



7T0 > 7Tl 



Figure 20.5: Effect of the ratio ttq/iti on the decision rule. 



where for real vectors u = (u' 1 ', . . . ,u^') T and v = (v^ 1 ' , . . . ,v^') T taking value 

in R' we define 9 

J 






||u|| 4 y^u^ 



x E(- (J) ) 



and where 



s - Sl 



ll s o-Si|| 
is a unit-norm vector pointing from Si to So- 
An optimal decision rule is to guess "H = 0" when LLR(y) > In — , i.e. 



(20.84) 
(20.85) 
(20.86) 



Guess "H = 0" if (y, <f>) E > 



(So,</») E + (Sl)*/')! 



In 



71"! 



SO-Sl 7T 



(20.87) 



and to guess "H =1" otherwise. This decision rule is illustrated in Figure 20.5. 
Depicted are the cases where 7Ti/7To is smaller than one, equal to one, and larger 
than one. 

It is interesting to note that the projection (y,0) E </> of y onto the normalized 
vector (p = (so — Si)/ ||s — Si|| forms a sufficient statistic for this problem. Indeed, 
by (20.83), the log likelihood-ratio (and hence the likelihood-ratio) function is 
computable from (y,0) E . The projection is depicted in Figure 20.6. 

The rule (20.87) simplifies if H has a uniform prior. In this case the rule is 



Guess U H = 0" if (y, 0) E > 



(S ,</>) E + (Si,0)j 



(20.88) 



Note that in this case the guessing rule can be implemented even if a 1 is unknown. 



9 This is sometimes called the standard inner product on K' or the inner product between 
J-tuples. The subscript "E" stands here for "Euclidean." 
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Figure 20.6: The projection of y onto the normalized vector <j) = ( s o — s i)/ll s o — s i| 



20.14.3 Error Probability Analysis 

We next find the error probability associated with our guessing rule. We de- 
note the conditional probabilities of error associated with our guessing rule by 
Pmap (error |_ff = 0) and pMAp(error|7J = 1). Since our rule is optimal, its uncon- 
ditional probability of error is p* (error) and is given by 

p* (error) = tt Pmap (error \H = 0) + 7Ti pmap (error \H = 1). (20.89) 

Because in (20.87) we resolved ties by guessing "H = 0", it follows that to evaluate 
Pmap (error |_ff = 0) we need to evaluate the probability that a random vector Y 
drawn according to the density /y|.h=o(") is such that the a posteriori probability 
of H = is strictly smaller than the a posteriori probability of H = 1. Thus, if 
ties in the a posteriori distribution of H are resolved in favor of guessing U H = 0" , 
then 

PMAp(error|i7 = 0) = Pr [^ / Y |ff=o(Y) < ^i/y|h=i(Y) \H = 0]. (20.90) 

This may seem self-referential, but it is not. Another way to state this is 

p M Ap(error|iJ = 0) = f / Y |«=o(y)dy, (20.91) 

-'y^Bi.o 



where 



Bi,o = {y € M J : TT fv\H=o(y) > Ti/Y|ff=i(y)}- (20.92) 



To compute this probability we need the following lemma: 

Lemma 20.14.1. Let ir a and tt\ be strictly positive but not necessarily sum to one. 
Let the vectors So,Si G R^ differ in at least one component, i.e., ||so — Si|| > 0. Let 



vw; -v 2a*^" So) »• ye 
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A(y) 



\/2ttct 2 



exp 



J .7 = 1 7 



ye 



where a 2 > 0. Define 



Then 



Bi,o = {yeK'^o/o(y)>^i/i(y)} 
/o(y)dy=Q 



y£8i,o 



so -si 



2(7 



rrln — 

So - Si TTi 



(20.93) 



T/iis equality continues to hold if we replace the weak inequality (>) in the definition 
of B\fl with a strict inequality (>). 

Proof. Using a calculation identical to the one leading to (20.83) we obtain that 
the set £?i.o can also be expressed as 



Bi.o 



ye 



,i . 



<y,0> E > 



(S0,</>) E + ( s l ></»>! 



|SQ-Si|| 7T 



(20.94) 



where is defined in (20.86). 

The density /o(-) is the same as the density of the vector So + Z, where the com- 
ponents ZW,...,ZW of Z are IID 7V(0,<t 2 ). Thus, the LHS of (20.93) can be 
expressed as 



y&Bi.o 



/o(y)dy = Pr 
= Pr 
= Pr 
= Pr 
= Pr 

= Q 



(s o + Z,0) E < 

<Z,0>B< 



(so,0) E + (si,0)i 
E " 2 

(Sl,0) E - (SO,0) E 



|S0-Sl|| " 7T 



-<Z,0) E > 

-<z» E > 

<Z,-0) E > 
l|so-si|| 

2cr 



(S O ,0) E - (Sl,0)i 



O- 7T1 

|S0-Sl|| 7T 

f- g2 in ^ 

||S — Si|| TTi 



(Sp -Si.^j 

2 

l|so-si|| , 



|S — Sl|| TTi 

7T0 



2 
(7 



■In 

|S — Sill 7Ti 



| So - Si|| TTi 



7T0 



where the first equality follows from (20.94) and from the observation that the 
density /o(') is the density of So + Z; the second because (•, -) E in linear in the first 
argument, so (so + Z, 0) E = (so, 0) E +(Z, <fi) E ; the third by noting that multiplying 
both sides of an inequality by (—1) requires changing the direction of the inequality; 
the fourth by the linear relationship (si, </>) E — (s , 4>) E = (s\ — s , 4>) E ; the fifth by 
(20.86); and the final equality because, as we next argue, (Z, —4>) E ~ A/"(0, c 2 ), so 
we can employ (19.12a). To see that (Z,-0) E ~ Af(0,cr 2 ), note that, by (20.86), 
|| — 0|| = 1 and then employ Proposition 19.7.3. 
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This establishes the first part of the lemma. The result where the weak inequality 
is replaced with a strict inequality follows by replacing all the weak inequalities 
in the proof with the corresponding strict inequalities and vice versa. (If X has a 
density, then Pr[X < £] = PrLY < £].) □ 



By applying Lemma 20.14.1 to our problem we obtain 

) — si|| a_ 

2(T ||So — Silt ' 7Tl 



p M Ap(error|tf = 0) = Q ( " S " Sl " + „ ° „ In ^ ) . (20.95) 



Similarly, one can show that 

2a ||s -si|| 7T 



Pmap (error|ff = 1) = Q ( " S ° Sl11 + „ a „ In ^ I . (20.96) 



Consequently, by (20.89) 

*/ x n/llso-SiH a 7T 

V (error) = ,„ Q\—^~ + ^^ In - 

+ 7riQ (^LzM + _^ ln M. ( 2 o.97) 

V 2a || so — si II tto/ 

In the special case where the prior is uniform we obtain from (20.95), (20.96), and 
(20.97) 

p*(error) =p M Ap(error|iJ = 0) =p M Ap(error|if = 1) = g( 2 ~ )• (20.98) 

This has a nice geometric interpretation. It is the probability that a A/"(0,<7 2 ) RV 
exceeds half the distance between the vectors s and Si. Stated differently, since 
1 1 So — Si|| J a is the number of standard deviations that separate So and Si, we can 
express the probability of error as the probability that a standard Gaussian exceeds 
half the distance between the vectors as measured in standard deviations of the 



20.14.4 The Bhattacharyya Bound 

Finally, we compute the Bhattacharyya Bound for this problem. From (20.50) we 
obtain that, irrespective of the values of 7To,7i"i, 

p* (error) 

< 

~ 2 



o / v/y|tf=o(y)/y|H=i(y)dy 

1 JyeK' 
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(20.99) 



where the last integral is evaluated using (19.22). 

20.15 Guessing in the Presence of a Random Parameter 

We now consider the guessing problem when the distribution of the observable Y 
depends not only on the hypothesis H but also on a random parameter 0, which 
is independent of H. Based on the conditional densities /Y|e,ff=o(')> /Yie,ff=i(')i 
the nondegenerate prior 7ro,7ri > 0, and on the law of O, we seek an optimal rule 
for guessing H. We distinguish between two cases depending on whether we must 
base our guess on the observed value y b s of Y alone — random parameter not 
observed — or whether we also observe the value # bs of — random parameter 
observed. The analysis of both cases is conceptually straightforward. 

20.15.1 Random Parameter Not Observed 



The guessing problem when the random parameter is not observed is sometimes 
called "testing in the presence of a nuisance parameter." Conceptually, the situ- 
ation is quite simple. We have only one observation, Y = y bs, and an optimal 
decision rule is the MAP rule (Theorem 20.7.1). The MAP rule entails computing 
the likelihood-ratio function 



LR(y h 



/Y|ff=o(yobs) 

/YiH=i(yobs)' 



(20.100) 
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and comparing the result to the threshold tti/ttq; see (20.40). 

Often, however, the densities /Y|ff=o(yobs) and /Y|i?=i(yobs) appearing in (20.100) 
are not given directly. Instead we are given the density of O and the conditional 
density of Y given (H, 0). (We shall encounter such a situation in Chapter 27 
when we discuss noncoherent communications.) In such cases we can compute the 
conditional density /Y|ff=o(yobs) as follows: 

/Y|H=o(yobs) = / /Y,e|H=o(yobs,^)d6 l 

Je 

fv\e=eM=o{yobs) f&\H=o(8) d0 

/ Y |e=0,ff=o(yobs) fe(9) dd, (20.101) 



where the first equality follows because from the joint density one obtains the 
marginal density by integrating out the variable in which we are not interested; 
the second by the definition of the conditional density; and the final equality from 
our assumption that and H are independent. (In computations such as these 
it is best to think about the conditioning on H = as defining a new law on 
(Y, 0) — a new law to which all the regular probabilistic manipulations, such as 
marginalization and computation of conditional densities, continue to apply. We 
thus simply think of the conditioning on H = as specifying the joint law of (Y, 0) 
that we have in mind.) 

Repeating the calculation under H = 1 we obtain that the likelihood-ratio function 
is given by 

(20.102) 



LR(y bs) = 


Jo fy\e=e,H=o{yobs) /e(#) d# 


Jg fy\e=e,H=i(yobs) fe(&) d0 



The case where is discrete can be similarly addressed. An optimal decision rule 
can now be derived based on this expression for the likelihood-ratio function and 
on the MAP rule (20.40). 



20.15.2 Random Parameter Observed 

When the random parameter is observed to be = # bs, we merely view the 
problem as a standard hypothesis testing problem with the observation consisting 
of Y and 0. That is, we base our decision on the likelihood-ratio function 

TT) , a s /Y,e|JJ=o(yobs,0obs) ,„„ 1ra , 

LR(y bs,#obs) = -, ! -. - a — r- (20.103) 

/Y,e|H=l(,yobs, "obsj 

The additional twist is that because is independent of H we have 

/Y,e|#=o(yobs>#obs) = /e|,H=o(#obs)/Y|e=0 obB ,ff=o(yobs) 

= /e(0obs)JY|e=e obs ,ff=o(yob s ), (20.104) 
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where the second equality follows from the independence of and H . Repeating 
for the conditional law of the pair (Y, 0) given H = 1 we have 

/Y,e|ff=l(yobs,#obs) = /e(#obs)/Y|e=e obB ,ff=l(yobs)- (20.105) 

Consequently, by (20.103), (20.104), and (20.105), we obtain that for o b s satisfying 
/e(#o bs )^0 



JT)t a n iY|_H=O.e=0 obs (yobs) 

^tt(,y bs, "obs " 



fy\H=l,e=0 obs (Yobs) 



(20.106) 



An optimal decision rule can be again derived based on this expression for the 
likelihood-ratio and on the MAP rule (20.40). 

20.16 Mathematical Notes 

A standard reference on hypothesis testing is (Lehmann and Romano, 2005). It 
also contains a measure-theoretic treatment of the subject. For a precise math- 
ematical definition of the condition X—o—Y—o—Z we refer the reader to (Loeve, 
1963, Section 25.3). For a measure-theoretic treatment of sufficient statistic see 
(Loeve, 1963, Section 24.4), (Billingsley, 1995, Section 34), (Romano and Siegel, 
1986, pp. 154-156), and (Halmos and Savage, 1949). For a measure-theoretic treat- 
ment of the notion of conditional distribution see, for example, (Billingsley, 1995, 
Chapter 6), (Williams, 1991, Chapter 9), or (Lehmann and Romano, 2005, Chap- 
ter 2). 

20.17 Exercises 

Exercise 20.1 (Hypothesis Testing). Let H take on the values and 1 equiprobably. 
Conditional on H — 0, the observable Y is equal to a + Z, where Z is a Laplace RV, i.e., 
is of density 

fz{z) = ie -1 * 1 , zeR, 

and a > is a given constant. Conditional on H = 1, the observable Y is given by — a + Z. 

(i) Find and draw the densities f Y \H=o(') an d fy\H=i{')- 

(ii) Find an optimal rule for guessing H based on Y. 

(iii) Compute the optimal probability of error, 

(iv) Compute the Bhattacharyya Bound. 

Exercise 20.2 (A Discrete Multi-Dimensional Problem). Let H take on the values 
and 1 according to the prior (7To,7ri). Let the observation Y = (Fi, . . . ,Y n ) J be an n- 
dimensional binary vector. Conditional on H = 0, the components of the vector Y are 
IID with 

Pr[Y e = l\H = 0] =1- Pr[Yi = | H = 0] = 0.25, i=l,...,n. 

Conditional on H = 1, the components are IID with 

Pr[Ff = \\H=l] = l-Pr[Y< = | H = l] =0.75, i=l,...,n. 
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(i) Find an optimal rule for guessing H based on Y. 
(ii) Compute the optimal probability of error. 
(iii) Compute the Bhattacharyya Bound. 

Hint: You may need to treat the cases of n even and n odd separately. 

Exercise 20.3 (A Multi-Antenna Receiver). Let H take on the values and 1 equiprob- 
ably. We wish to guess H based on the random variables Y\ and Y%. Conditional on 
H = 0, 

Y 1 = A + Z 1 , Y 2 = A + Z 2 , 

and conditional on H = 1, 

Y 1 =-A + Z 1 , Y 2 = -A + Z 2 . 

Here A is a positive constant, and Z\ ~ 7V(0, a\ ) , Z 2 ~ 7V(0, <rf ) , and H are independent. 

(i) Find an optimal rule for guessing H based on (Yi,Y 2 ). 

(ii) Draw the decision regions in the (Yi, Y2)-plane for the special case where <ti — 2a 2 . 

(iii) Returning to the general case, find a one-dimensional sufficient statistic. 

(iv) Find the optimal probability of error in terms of a 1 , a 2 , and A. 

(v) Consider a suboptimal receiver that declares Li H — 0" if Y\ +Y 2 > 0, and otherwise 
declares U H = 1." Evaluate the probability of error for this decoder as a function 
of (Tj , <t 2 , and A. 

Exercise 20.4 (Binary Hypothesis Testing with General Costs). Let H take on the values 
and 1 according to the prior (itq, tyi). The observable Y has conditional densities /Y|ij=o(') 
and fy\H=i(:)- Based on Y, we wish to guess the value of H. Let the guess associated 
with Y = y Q bs be denoted by <^Guess(yobs)- Guessing a H — rj" when H — v costs c(r\, v), 
where c(-,-) is a given function from {0,1} x {0,1} to the nonnegative reals. Find a 
decision rule that minimizes the expected cost 

i i 

E[c(<?i Gu e S8 (Y), H)\ =J2 n -J2 c ^> ") Pr[0Gue 8S (Y) = V \H = v]. 

v = r; = 

Exercise 20.5 (Binary Hypothesis Testing). Let H take on the values and 1 according 
to the prior (ttq, 7Ti), and let the observation consist of the RV Y. Conditional on H, the 
densities of Y are given for every y £ R by 

/y|ff=o(!/) = z~ y I{» > 0}, fy\H=i{y) = f3e-^l{y> 0}, 

where /3 > is some constant. 

(i) Determine j3. 

(ii) Find a decision rule that minimizes the probability of error, 
(iii) For the rule that you have found, compute Pr(error|if = 0). 

Hint: Different priors can lead to dramatically different decision rules. 
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Exercise 20.6 (Bhattacharyya Bound). 

(i) Show that the Bhattacharyya Bound never exceeds 1/2. 
(ii) When is it equal to 1/2? 

Hint: You may find the Cauchy-Schwarz Inequality useful. 

Exercise 20.7 (The Bhattacharyya Bound for Conditionally I ID Observations). Consider 
a binary hypothesis testing problem where, conditional on H = 0, the J components of 
the observed random vector Y are IID with each component of density /o(-). Conditional 
on H — 1 the components of Y are IID with each component of density /i(-). Express 
the Bhattacharyya Bound in terms of J and 



Vfo(y).fi(y)dy. 



Exercise 20.8 (Error Probability and Ci -Distance). Consider the setting of Theorem 20.5.2 
when H has a uniform prior. Show that in this case (20.26) can also be written as 

Pr[<^Gue SS (Y) ^ H] = \ - - [ \f YlH=0 (y) - fr\B=i(y)\ dy. 

Exercise 20.9 (Conditionally Poisson Observations). A RV X is said to have a Poisson 
distribution of parameter ("intensity") A, where A is some nonnegative real number, if X 
takes value in the nonnegative integers and 

\n 

PrlX =n] =e~ x — , n = 0, 1, 2, . . . 
n\ 

(i) Find the Moment Generating Function of a Poisson RV of intensity A. 

(ii) Show that if X and Y are independent Poisson random variables of intensities X x 
and \ y , then their sum X + Y is Poisson with parameter X x + X y . 

(iii) Let H take on the values and 1 according to the prior (tto,tt\). We wish to 
guess H based on the RV Y . Conditional on H — 0, the observation Y is Poisson 
of intensity a + A, whereas conditional on H — 1 it is Poisson of intensity j3 + A. 
Here a, j3, A are known non-negative constants. Show that the optimal probability 
of error is monotonically non-decreasing in A. 

Hint: For Part (iii) recall Part (ii) and that no randomized decision rule can outperform 
an optimal deterministic rule. 

Exercise 20.10 (Optical Communication). Consider an optical communication system 
that uses binary on/off keying at a rate of 10 8 bits per second. At the beginning of each 
time interval of duration 10 -8 seconds a new data bit D enters the transmitter. If D — 0, 
the laser is turned off for the duration of the interval; otherwise, if D = 1, the laser is 
turned on. The receiver counts the number Y of photons received during the interval. 
Assume that, conditional on D, the observation Y is a Poisson RV whose conditional 
PMF is 

Pr[Y = i/|L> = 0] =^y^, y = 0,1,2,..., (20.107) 

Pr[y = y|D = l] =^— A V = 0,1, 2,..., (20.108) 



w 



here A > /i > 0. Further assume that Pt[D = 0] = Pr[D = 1} = 1/2. 
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(i) Find an optimal guessing rule for guessing D based on Y. 

(ii) Compute the optimal probability of error. (Not necessarily in closed- form.) 

(iii) Suppose that we now transmit each data bit over two time intervals, each of duration 
10~ 8 seconds. (The system now supports a data rate of 0.5 x 10 8 bits per second.) 
The receiver produces the photon counts Y\ and Y 2 over the two intervals. Assume 
that, conditional on D = 0, the counts Y 1 & Y 2 are IID with the PMF (20.107) 
and that, conditional on D - 1, they are IID with the PMF (20.108). Find a 
one-dimensional sufficient statistic for the problem and use it to find an optimal 
decision rule. 

Hint: For Part (iii), recall Part (ii) of Exercise 20. 9. 

Exercise 20.11 (Monotone Likelihood Ratio and Log-Concavity). Let H take on the 
values and 1 according to the nondegenerate prior (7To,7Ti). Conditional on H — 0, the 
observation Y is given by 

Y = £0 + Z, 

where £0 £ R is some deterministic number and Z is a RV of PDF fz(-). Conditional on 
H — 1, the observation Y is given by 

Y = b+Z, 
where £1 > £o- 

(i) Show that if the PDF fz() is positive and is such that 

Myi-£o)/ z (yo-£i)</z(2/i-£i)/z0/o-£o), (yi>y , 6 > £0), (20.109) 

then an optimal decision rule is to guess "H — 0" if Y < y* and to guess U H — 1" 
if Y > y* for some real number y* . 

(ii) Show that if z 1— > log/z(z) is a concave function, then (20.109) is satisfied. 

Mathematicians state this result by saying that if g : R — > R is positive, then the mapping 
(x,y) 1— > g(x — y) has the Total Positivity property of Order 2 if, and only if, g is log- 
concave (Marshall and Olkin, 1979, Chapter 18, Section A, Example A. 10). Statisticians 
state this result by saying that a location family generated by a positive PDF /(•) has 
monotone likelihood ratios if, and only if, /(■) is log-concave. For more on distributions 
with monotone likelihood ratios see (Lehmann and Romano, 2005, Chapter 3, Section 
3-4). 

Hint: For Part (ii) recall that a function g : R 1— > R is concave if for any a < b and 
< a < 1 we have g(aa + (1 — a)&) > ag(a) + (1 — a) 9(b). You may like to proceed as 
follows. Show that if g is concave then 

g(a - A 2 ) + g(a + A 2 ) < g(a - Ai) + g(a + Ai), |Ai| < |A 2 |. 

Defining g(z) — log/z(z), show that the logarithm of the LHS of (20.109) can be written 
as 

g(y-Z+ \&y + \&z) +g{y-Z- \& v - \^ 

where 

y = (yo+yi)/2, £=(£o+£i)/2, A y = yi - y , A«=Ci-&- 
Show that the logarithm of the RHS of (20.109) is given by 

g(y-£+ \& y - ^A e ) + g (y-Z+ iA e - ^A, 
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Exercise 20.12 (Is a Uniform Prior the Worst Prior?). Based on an observation Y, we 
wish to guess the value of a RV H taking on the values and 1 according to the prior 
(7To,7Ti). Conditional on H — 0, the observation Y is uniform over the interval [0, 1], and, 
conditional on H — 1, it is uniform over the interval [0, 1/2]. 

(i) Find an optimal rule for guessing H based on the observation Y . Note that the 
rule may depend on tyq. 

(ii) Let p* (error; 7ro) denote the optimal probability of error. Find p* (error; no) and 
plot it as a function of no in the range < no < 1. 

(iii) Which value of no maximizes p* (error; 7To)? 

Consider now the general problem where the RV Y is of conditional densities fy\H=o{ % ), 
/y|H=i(")i an d H is of prior (7To,7Ti). Let p* (error; no) denote the optimal probability of 
error for guessing H based on Y. 

(iv) Prove that 

p* (error; -J > - p* (error; 7r ) + -p*(error; 1 - 7T ), 7r o e[0, 1]. (20.110a) 

(v) Show that if the densities fy\H=o( m ) ar, d fy\H=i(') satisfy 

fYlH=o(v) = fY\B=l(-V)> VSR, (20.110b) 

then 

p* (error; no) — p* (error; 1 — no), 7roG[0,l]. (20.110c) 

(vi) Show that if (20.110b) holds, then the uniform prior is the worst prior: 

p* (error; n ) < p* (error; 1/2), n Q G [0, 1]. (20.110d) 

Hint: For Part (iv) you might like to consider a new setup. In the new setup H — M @ S , 
where ® denotes the exclusive-or operation and where the binary random variables M 
and S are independent with S taking value in {0, 1} equiprobably and with Pr[M = 0] = 
1 — Pr[M = 1] = 7To. Assume that in the new setup (M,S) — o—H — o—Y and that the 
conditional density of Y given H — is fy\H=o{') an d given H — 1 it is fy\H=i(:)- 
Compare now the performance of an optimal decision rule for guessing H based on Y 
with the performance of an optimal decision rule for guessing H based on the pair (Y, S). 
Express these probabilities of error in terms of the parameters of the original problem. 

Exercise 20.13 (Hypothesis Testing with a Random Parameter). Let Y — X + AZ, 

where X, A, and Z are independent random variables with X taking on the values ±1 
equiprobably, A taking on the values 2 and 3 equiprobably, and Z ~ Af(0, a ). 

(i) Find an optimal rule for guessing X based on the pair (Y, A). 
(ii) Repeat when you observe only Y. 

Exercise 20.14 (Bounding the Conditional Probability of Error). Show that when the 
prior is uniform 



PMAp(error|# = 0) < / J fv\ H =o(y) / Y |H=i(y) d Y 
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Exercise 20.15 (Upper Bounds on the Conditional Probability of Error). 

(i) Let H take on the values and 1 according to the nondegenerate prior (no, ni). Let 
the observation Y have the conditional densities /y|h=o( - ) an d /y|h = i(-)- Show 
that for every p > 

PMAp(error| J ff = 0) < (^)" J # |H=1 (y) &]£=<>&) <&■ 

(ii) A suboptimal decoder guesses "H — 0" if go(y) > Qi(y); guesses U H — 1" if 
9o(y) < 9i(y)> an d otherwise tosses a coin. Here go(') an d Qi(-) are arbitrary 
positive functions. Show that for this decoder 

p(error\H = 0) < J (|SY /v|H=o(y) dy, P > 0. 

Hint: In Part (i) show that you can upper-bound I{7ri /YiH=i(y)/( 7r o /Y|H"=o(y)) — 1} by 

(ti /Y|H=i(y)/( 7r o / Y |i/=o(y))) P - 

Exercise 20.16 (The Hellinger Distance). The Hellinger distance between the densities 
/(•) and g(-) is defined as the square root of 



\J{ym)-^m) 2 ^ 



2 
(though some authors drop the one-half). 

(i) Show that the Hellinger distance between /(•) and h(-) is upper-bounded by the 
sum of the Hellinger distances between /(•) and g(-) and between <;(•) and h(-). 

(ii) Relate the Hellinger distance to the Bhattacharyya Bound. 

(iii) Show that the Hellinger distance is upper-bounded by one. 

Exercise 20.17 (Artifacts of Suboptimality). Let H take on the values and 1 equiprob- 
ably. Conditional on H = 0, the observation Y is J\f[l,a 2 ), and, conditional on H = 1, 
it is A/"(— 1, a 2 ) . Alice guesses "H — 0" if Y > 2 and guesses U H — 1" otherwise. 

(i) Compute the probability that Alice errs as a function of o 2 . 

(ii) Show that this probability is not monotonically nondecreasing in a . 

(iii) Does her guessing rule minimize the probability of error? 

(iv) Show that if you are obliged to use her rule, then adding noise to Y prior to feeding 
it to her detector may be beneficial. 

Exercise 20.18 (The Bhattacharyya Bound and a Random Parameter). Let 6 be inde- 
pendent of H and of density /e(0- Express the Bhattacharyya Bound on the probability 
of guessing H incorrectly in terms of fe(-), f-Y\e=e,H=o(:) an d /Y|e=e.H = i(0- Treat the 
case where Q is not observed and the case where it is observed separately. Show that the 
Bhattacharyya Bound in the former case is always at least as large as in the latter case. 



Chapter 21 

Multi- Hypothesis Testing 

21.1 Introduction 

In Chapter 20 we discussed how to guess the outcome of a binary random variable. 
We now extend the discussion to random variables that take on more than two — but 
still a finite — number of values. Statisticians call this problem "multi-hypothesis 
testing" to indicate that there may be more than two hypotheses. Rather than 
using H, we now denote the random variable whose outcome we wish to guess 
by M. (In Chapter 20 we used H for "hypothesis;" now we use M for "message.") 
We denote the number of possible values that M can take by M and assume that 
M > 2. (The case M. = 2 corresponds to binary hypothesis testing.) As before the 
"labels" are not important and there is no loss in generality in assuming that M 
takes value in the set M = {1, . . . , M}. (In the binary case we used the traditional 
labels and 1 but now we prefer 1,2,..., M.) 

21.2 The Setup 

A random variable M takes value in the set M = {1,...,M}, where M > 2 
according to the prior 

7r m = Pr[M = m], meM, (21.1) 

where 

7r m > 0, meM, (21.2) 

and where 

J2 7Tm = l. (21.3) 

We say that the prior is nondegenerate if 

7r m > 0, meM, (21.4) 

with the inequalities being strict, so M can take on any value in M with positive 
probability. We say that the prior is uniform if 

1 

7ri = ---=7TM.= — . (21.5) 
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The observation is a random vector Y taking value in R d . We assume that for 
each m€M the distribution of Y conditional on M = m has the density 1 

/Y|M=m(-)> meM, (21.6) 

where /y| M=m(") is a nonnegative Borel measurable function that integrates to 
one over R d . 

A guessing rule is a Borel measurable function ^Guess : R d — ► A 7 ! from the space 
of possible observations K d to the set of possible messages M.. We think about 
^Guess(yobs) as the guess we form after observing that Y = y bs- The error 
probability associated with the guessing rule </>Guess(") is given by 

Pr[<£ Guess (Y)^M]. (21.7) 

Note that two sources of randomness determine whether we err or not: the real- 
ization of M and the generation of Y conditional on that realization. A guessing 
rule is said to be optimal if no other guessing rule achieves a lower probability 
of error. 2 The optimal error probability p* (error) is the probability of error 
associated with an optimal decision rule. In this chapter we shall derive optimal 
decision rules and study the optimal probability of error. 



21.3 Optimal Guessing 

Having observed that Y = y bs, we would like to guess M. An optimal guessing 
rule can be derived, as in the binary case, by first considering the scenario where 
there are no observables. Its extension to the more interesting case where we 
observe Y is straightforward. 

21.3.1 Guessing in the Absence of Observables 

In this scenario there are only M deterministic decision rules to choose from: the 
decision rule "guess 1" , the decision rule "guess 2" , etc. If we employ the "guess 1" 
rule, then we are correct if M is indeed equal to 1 and thus with probability of 
success 7i"i and corresponding probability of error of 1 — tv± . In general, if we employ 
the "guess m" rule for some m € M., then our probability of success is 7r m . Thus, 
of the M. different rules at our disposal, the one that has the highest probability 
of success is the "guess m" rule, where m is the outcome that is a priori the most 
likely. If this rh is not unique, then guessing any one of the outcomes that have 
the highest a priori probability is optimal. 



1 We feel no remorse for limiting ourselves to conditional distributions possessing a density. 
The reason is that, while the reader is encouraged to assume that the densities are with respect to 
the Lebesgue measure, this assumption is never used in the text. And using the Radon-Nikodym 
Theorem (Billingsley, 1995, Section 32), one can show that even in the most general case there 
exists a measure on M. d with respect to which the conditional laws of Y conditional on each of 
the possible values of M are absolutely continuous. That measure can be taken, for example, as 
the sum of the conditional laws corresponding to each of the possible values that M can take. 

2 As in the case of binary hypothesis testing, an optimal guessing rule always exists. 
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We conclude that in the absence of observables, the guessing rule "guess m" is 
optimal if, and only if, 

ir m = max ■Km'. (21.8) 

m'eM 

For an optimal guessing rule the probability of success is 

p* (correct) = max{7r m /}, (21.9) 

m'eM 

and the optimal error probability is thus 

p* (error) = 1 — max |7r m '}. (21.10) 

m'eM 

21.3.2 The Joint Law of M and Y 

Using the prior {7r m } and the conditional densities {/y|m=to(')}i we can express 
the unconditional density of Y as 

/ Y (y)= E Tm/Y|M=m(y). Y € ^ ■ (21.11) 

As in Section 20.4, we define for every to G M. and for every y b s € K d the 
conditional probability that M = m conditional on Y = y bs by 

/ Y (yo bs ) /Yiyobsj > u, (21 J2) 

jjr otherwise. 

By an argument similar to the one proving (20.12) we have 

Pr[Ye{y£l d :/ Y (y) = 0}] = 0, (21.13) 

which can also be written as 

Pr[/ Y (Y) = 0] =0. 

21.3.3 Guessing in the Presence of Observables 

The problem of guessing in the presence of an observable is very similar to the 
one without observables. The intuition is that after observing that Y = y bs, we 
associate with each to € M the a posteriori probability Pr[M = m|Y = y bs] and 
then guess M as though there were no observables. Thus, rather than choosing 
the message that has the highest a priori probability as we do in the absence of 
observables, we should now choose the message that has the highest a posteriori 
probability. 

After having observed that Y = y bs we should thus guess "to" where to is the out- 
come in M. that has the highest a posteriori probability. If more than one outcome 
attains the highest a posteriori probability, then we say that a tie has occurred 
and we need to resolve this tie by picking one (it does not matter which) of the 
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outcomes that attains the maximum a posteriori probability. We thus guess "m," 
in analogy to (21.8), only if 

Pr[M = m|Y = y obs ] = max {Pr[M = m'|Y = y obs ]}. 

(We shall later define the Maximum A Posteriori guessing rule as a randomized 
decision rule that picks uniformly at random from the outcomes that have the 
highest a posteriori probability; see Definition 21.3.2 ahead.) 

In analogy with (21.9) we have that for this optimal rule 

p* (correct | Y = y obs ) = max {Pr[M = m'|Y = y Q bs]}, 

m'eM 

and in analogy with (21.10), 

p*(error|Y = y obs ) = 1 - max {Pr[M = m'|Y = y obs ]}. 

m'eM 

Consequently, the unconditional optimal probability of error can be expressed as 
p* (error) = / (1— max {PrfM = m! I Y = y]} ) /v(y) dy, 

J R d \ m'eM" • 'J 

where /y( - ) is the unconditional density function of Y and is given in (21.11). 

We next proceed to make the above intuitive discussion more rigorous. We begin 
by defining for every possible observation y obs £ K d the set of outcomes of maximal 
a posteriori probability: 

*M(y obs ) = \m G M : Pr[M = m|Y = y obs ] = max Pr[M = m'\Y = y obs ]|. 

I m'eM i 

(21.14) 
As we next argue, this set can also be expressed as 

M{yohs) = I m G M : 7rm/ Y |M=m(yobs) = max ir rn , / Y i M=m / (y c b s ) } • (21.15) 

L ' m'eM ' J 

This can be shown by treating the case /Y(yobs) > and the case /v(yobs) = 
separately. In the former case, (21.15) is verified by noting that in this case we 
have, by (21.12), that Pr[M = m'|Y = y obs ] = n m , / Y |M=m'(yob s )//Y(y b s ), so 
the result follows because scaling the scores of all the elements of a set by a positive 
number that is common to them all (l//Y( v obs)) does not change the subset of 
the elements with the highest score. In the latter case we note that, by (21.12), 
we have for all m! e M that Pr[M = m'|Y = y obs ] = 1/M, so the RHS of (21.14) 
is M. and we also have by (21.11) for all m! G A4 that ir m i /Y|M=m'( v obs) = so 
the RHS of (21.15) is also M. 

Using the above definition of A1 (y bs) we can now state the main theorem regarding 
optimal guessing rules. 

Theorem 21.3.1 (Optimal Multi- Hypothesis Testing). Let M take value in the set 
M = {1, . . . , M} with the prior (21.1), and let the observation Y be a random vec- 
tor taking value in M. d with conditional densities /y|m=i(')i • • • :/yim=m(')- Any 
guessing rule <Pq ubss '■ K d — * M. that satisfies 

0Guess(yobs) G >!(yobs), yobs G R d (21.16) 

is optimal. Here A'f(yobs) is the set defined in (21.14) or (21.15). 
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Proof. Every (deterministic) guessing rule induces a partitioning of the space of 

M 



possible outcomes WL d into M disjoint sets T>\, . . . , Pj\ 



|J V m = R d , (21.17a) 



P TO nX> m / = 0, ro^m', (21.17b) 

where T> m is the set of observations that result in the guessing rule producing 
the guess "M = m." Conversely, every partition Di,...,Dm of R corresponds 
to some deterministic guessing rule that guesses "M = m" whenever y b s € P m - 
Searching for an optimal decision rule is thus equivalent to searching for an optimal 
way to partition R d . For every partition T>\, . . . ,2?m the probability of success of 
the guessing rule associated with it is given by 

Pr (correct ) = 2> n m P r (correct | M = m) 

m£M 



m£M 

m£M 



/v|M=m(y)dy 

/Y|M=m(y) i{y e v m } dy 



E ^"^ /v|M=m(y) !{y G p m} J dy. 

To minimize the probability of error we maximize the probability of correct deci- 
sion. We thus need to find a partition Pi, ... , Dm that maximizes the last integral. 

To maximize the integral we shall maximize the integrand 



E 7r ™ /Y|M=m(y) !{y e P m}- 



For a fixed value of y, the value of the integrand depends on the set to which we 
have assigned y. If y was assigned to T>\ (i.e., if y G Pi), then all the terms in the 
sum except for the first are zero, and the value of the integrand is w\ fy\M=i(y)- 
More generally, if y was assigned to T> m , then all the terms in the sum except for 
the m-th term are zero, and the value of the integrand is 7r TO /Y|M=m(y)- For a 
fixed value of y, the integrand will thus be maximized if we assign y to the set T>m 
(and correspondingly guess in), only if 

f* /Y|M=m(y) = m^X {TTrn' f Y \M=m'{y)}- 

Thus, if </>Gucss(') satisfies the theorem's hypotheses, then it maximizes the in- 
tegrand for every y € M. d and thus also maximizes the probability of guessing 
correctly. □ 



21.3.4 The MAP and ML Rules 

As in the binary hypothesis testing case, we can also consider randomized decision 
rules. Extending the definition of a randomized decision rule to our setting, one 
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can show using arguments very similar to those of Section 20.6 that randomization 
does not help: no randomized decision rule can yield a smaller probability of error 
than an optimal deterministic rule. But randomized decision rules can yield more 
symmetric or more "fair" rules. Indeed, we shall define the MAP rule as the 
randomized rule that resolves ties by choosing one of the messages that achieves 
the highest a posteriori probability uniformly at random: 

Definition 21.3.2 (The M-ary MAP Decision Rule). The Maximum A Poste- 
riori decision rule is the guessing rule that, after observing that Y = y bs 7 forms 
a guess by picking uniformly at random an element of the set -M(y bs)> which is 
defined in (21.14) or (21.15). 

Theorem 21.3.3 (The MAP Rule Is Optimal). For the setting of Theorem 21.3.1 
the MAP decision rule is optimal in the sense that it achieves the smallest proba- 
bility of error among all deterministic or randomized decision rules. Thus, 

p* (error) = \, "m Pmap (error \M = m), (21.18) 

where p* (error) denotes the optimal probability of error and pMAp(error|M = m) 
denotes the conditional probability of error of the MAP rule. 

Proof. Irrespective of the realization of the randomization that is used to pick 
an element of *M(y bs), the resulting decision rule is optimal (Theorem 21.3.1). 
Consequently, the average probability of error that results when we average over 
this source of randomness must also be optimal. □ 

The Maximum-Likelihood (ML) rule ignores the prior. It is identical to the 
MAP rule when the prior is uniform. Having observed that Y = y bs, the ML 
decoder produces as its guess a member of the set 

ImeM : / Y I M=rh (Yobs) = max / Y |M=m'(yob s ) } 

that is drawn uniformly at random. 

The ML decoder thus guesses "M = m" only if 

/Y|M=m(yobs) = max / Y |M=m'(yobs)- (21.19) 

(If more than one outcome achieves this maximum, it chooses uniformly at random 
one of the outcomes that achieves the maximum.) 



21.3.5 Processing 

As in Section 20.11, we say that Z is the result of processing Y with respect to M 
if 

M-o-Y-o-Z 

forms a Markov chain. In analogy to Theorem 20.11.5, one can prove that if Z is 
the result of processing Y with respect to M , then no decision rule based on Z can 
outperform an optimal decision rule based on Y. 
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(a 3 ,6 3 ) 




(05,65) 



Figure 21.1: Eight equiprobable hypotheses; the situation corresponds to 8-PSK. 



21.4 Example: Multi-Hypothesis Testing for 2D Signals 

21.4.1 The Setup 

Consider the case where M is uniformly distributed over the set M. = {1, . . . , M} 
and where we would like to guess the outcome of M based on an observation 
consisting of a two-dimensional random vector Y of components Y^ 1 ' and Y ( ■' . 
Conditional on M = m, the random variables Y^> and Y^ 2 ' are independent 
with y (1) ~ M{a m ,<y 2 ) and y (2) ~ J\f(b m ,a 2 ). We assume that a 2 > 0, so the 
conditional densities can be written for every m G M. and every j/' 1 ', y^ 2 ' Glas 



/yd),y(2)|M=m(y (1) ,y (2) ) 



2no" 



■ exp 



(y (D 



(y {2 



2a 2 



(21.20) 



This hypothesis testing problem is related to QAM communication over an additive 
white Gaussian noise channel with a pulse shape that is orthogonal to its time shifts 
by integer multiples of the baud period. The setup is demonstrated in Figure 21.1 
for the special case of M = 8 with 



am = A cos 



27T7TT, 

8 



, b m = A sin 



2nm 
8 



(21.21) 



This special case is related to 8-PSK communication, where M-PSK stands for 
M-ary Phase Shift Keying. 



21.4.2 The "Nearest-Neighbor" Decoding Rule 



We shall next derive an optimal decision rule. For typographical reasons we shall 
use y rather than y b s to denote the observed vector. To find an optimal decoding 
rule we note that, since M has a uniform prior, the Maximum-Likelihood rule 
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Figure 21.2: Shaded region corresponds to observations leading the ML rule to 
guess "M= 1." 



(21.19) is optimal. Now rh maximizes the likelihood function if, and only if, 



iY(i>,Y(2>|M=m(y (1) >2/ (2) ) = max {/y(i).y(2)|M=m'(2/ (1) >2/ (2) )}) 

1 ro'eAf ' J 



■& 



1 (« (1) -, f ,) 2 +(« (2) -^) 2 



1 



(» (1) -v) 2 +(» (2) -^) 2 



27TCT 2 



( B ( 1 )-q A ) 2 + ( i ,( 2 )- l ,, f ,) 2 



'eM I 2ira 2 

{v (r >-° m >) 2 + {y i2) -» m ,) 2 



«=> e 



max < e 

m'eM 



O 



(2/W-a^) 2 +( y ( 2 )-6^) 2 



2a 2 



max < 

m'eM 



(y^-g m *) 2 + (y ( V-b m ,y 

2a 2 



/ (y (1) - a A ) 2 + (y( 2 ) - b m ) z , f (y« - a m <) 2 + (y (2) - M' 
•«• I „ — = mm < 

m'€M 



2a 1 



2a 1 



o 



((y (1) - ^) 2 + (y (2) - M 2 = ^ M {(y (1) - a ^) 2 + iv {2) - M 2 }) 



'=■• ! I|y-Sm|| = f mjn ( j||y-s m ,|||j, 



where y = (j/ 1 ', y 2 ') T , s m = (a m jfrm) T f° r W € A4, and ||-|| denotes the Euclidean 
distance (23.4). It is thus seen that the ML rule (which is equivalent to the MAP 
rule because the prior is uniform) is equivalent to a "nearest-neighbor" decoding 
rule, which chooses the hypothesis under which the mean vector is closest to the 
observed vector (with ties being resolved at random). Figure 21.2 depicts the 
nearest-neighbor decoding rule for 8-PSK. The shaded region corresponds to the 
set of observables that result in the guess "M = 1," i.e., the set of points that are 
nearest to (Acos(27r/8), Asin(27r/8)). 
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Figure 21.3: Contour lines of the density /yi,f 2 |m=4(")- Shaded region corresponds 
to guessing "M = 4" . 



21.4.3 Exact Error Analysis for 8-PSK 

The analysis of the probability of error can be a bit tricky. Here we only present 
the analysis for 8-PSK. If nothing else, it will motivate us to seek more easily 
computable bounds. 

We shall compute the probability of error conditional on M = 4. But there is 
nothing special about this choice; the rotational symmetry of the problem implies 
that the probability of error does not depend on the hypothesis. 

Conditional on M = 4, the observables (Y ( - 1 \Y ( ' 2 ') T can be expressed as 

(yd), y(2)) T = (_ A , o) T + (z«, Z^) T , 

where Z 1 - 1 ' and Z^ 2 ' are independent Af(0, a 2 ) random variables: 



/. 



ZW, Z< 2 > 



( Z W,zM) 



2ira- 



- exp 



(2h2 



2a 2 



Z«, Z (2) G 



Figure 21.3 depicts the contour lines of the density fyw y( 2 )im=4(')j which are 
centered on the mean (04,64) = (— A, 0). Note that fyw ,Y( 2 )\M=i{') ls symmetric 
about the horizontal axis: 



/r( 1 ),^)|M= 4 (y (1) ,-y (2) ) = /ra>,r<*>|M=4(y (1) ,y (2) ), v {1 \y {2) 



(21.22) 



The shaded region in the figure is the set of pairs (2/ 1 ) , y^ 2 ') that cause the nearest- 
neighbor decoder to guess "M = 4." 3 Conditional on M = 4 an error results if 
(yC 1 ) ,y( 2 )) is outside the shaded region. 

Referring now to Figure 21.4 we need to compute the probability that the noise 
(Z^ ' , Z' ') causes the received signal to lie in the union of the shaded areas. The 



3 It can be shown that the probability that the observation lies exactly on the boundary of 
the region is zero; see Proposition 21.6.2 ahead. We shall thus ignore this possibility. 
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Figure 21.4: Error analysis for 8-PSK. 



symmetry of fym yw\m=a{') about the horizontal axis (21.22) implies that the 
probability that the received vector lies in the darkly-shaded region is the same as 
the probability that it lies in the lightly-shaded region. We shall thus compute the 
probability of the latter and double the result. 

Let tp = 7r/8 denote half the angle between the constellation points. To carry out 
the integration we shall use polar coordinates (r, 9) centered on the constellation 
point (— A, 0) corresponding to Message 4: 

('IT — 1\> 



PMAp(error|M = 4) 



1 



p{0) 



2lT<T 2 



e 2,2 rdrd(9 



TT — 'ip 



TT — Tp 



~ u dud9 



pH9)/{2^) 



&e. 



(21.23) 



where p{6) is the distance we travel from the point (—A, 0) at angle 9 until we 
reach the lightly-shaded region, and where the second equality follows using the 
substitution u = r 2 /(2a 2 ). Using the law of sines we have 

A sin ip 



P(0) 



i(0 + V)' 



(21.24) 



Since the symmetry of the problem implies that the conditional probability of error 
conditioned on M = m does not depend on m, it follows from (21.23), (21.24), and 



414 



Multi-Hypothesis Testing 



(21.18) that 



p* (error) = 


7T 


/ 


A 2 sm 2 V> 


$ = 


7T 
8' 


2si„2(e + v.) CT 2 (J0, 



(21.25) 



21.5 The Union-of-Events Bound 



Although simple, the Union-of-Events Bound, or Union Bound for short, is an 
extremely powerful and useful bound. 4 To derive it, recall that one of the axioms 
of probability is that the probability of the union of two disjoint events is the sum 
of their probabilities. 5 Given two not necessarily disjoint events V and W, we can 
express the set V as in Figure 21.5 as the union of those elements of V that are not 
in W and those that are both in V and in W: 



v = (v\w)u(vnw). 



(21.26) 



Because the sets V \ W and VflW are disjoint, and because their union is V, it 
follows that Pr(V) = Pr(V \ W) + Pr(V n W), which can also be written as 



Pr(V \ W) = Pr(V) - Pr(V n W). 
Writing the union V U W as the union of two disjoint sets 

VuW = Wu{V\W) 
as in Figure 21.6, we conclude that 

Pr(V U W) = Pr(VV) + Pr(V \ W), 
which combines with (21.27) to prove that 



Pr(V U W) = Pr(V) + Pr(VV) - Pr(V n W). 



Since probabilities are nonnegative, it follows from (21.30) that 

Pr(V UW)< Pr(V) + Pr(VV), 



(21.27) 



(21.28) 



(21.29) 



(21.30) 



(21.31) 



which is the Union Bound. This bound can also be extended to derive an upper 
bound on the union of more sets. For example, we can show that for three events 
U,V,W wehavePr(WUVUW) < Pr(W) + Pr(V) + Pr(>V). Indeed, by first applying 
the claim to the two sets U and (V U W) we obtain 

Pr(W U V U W) = Pr(W U (V U W)) 

< Pr(W) + Pr(V U W) 
<Pr(W) + Pr(V) + Pr(W), 

4 It is also sometimes called Boole's Inequality. 

5 Actually the axiom is stronger; it states that this holds also for a countably infinite number 
of sets. 
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Figure 21.5: Diagram of two nondisjoint sets. 





u 




Figure 21.6: Diagram of the union of two nondisjoint sets. 



where the last inequality follows by applying the inequality to the two sets V and W. 
One can continue the argument by induction for a finite 6 collection of events to 
obtain: 



Theorem 21.5.1 (Union-of-Events Bound). If Vi,V2, 
infinite collection of events then 



. , is a finite or countably 




(21.32) 



We can think about the LHS of (21.32) as the probability that at least one of 
the events Vi , V2, • • • occurs and of its RHS as the expected number of events that 
occur. Indeed, if for each j we define the random variables Xj(u>) = I{lo G Vj} for 
all uj e n, then the LHS of (21.32) is equal to Pr[]T\X, > 0] , and the RHS is 
J2j E[-^j], which can also be expressed as E[J^ ■ Xj\ . 

After the trivial bound that the probability of any event cannot exceed one, the 
Union Bound is probably the most important bound in Probability Theory. What 
makes it so useful is the fact that the RHS of (21.32) can be computed without 
regard to any dependencies between the events. 

Corollary 21.5.2. 

(i) If each of a finite (or countably infinite) collection of events occurs with prob- 
ability zero, then their union also occurs with probability zero. 



°In fact, this claim holds for a countably infinite number of events. 
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(ii) If each of a finite (or countably infinite) collection of events occurs with prob- 
ability one, then their intersection also occurs with probability one. 

Proof. To prove Part (i) we assume that each of the events Vi, V2, • ■ • is of zero 
probability and compute 

Pr (U v ^E Pr (^) 

i i 

= £° 

3 
= 0, 

where the first inequality follows from the Union Bound, and where the subsequent 
equality follows from our assumption that Pr(Vj) = 0, for all j. 

To prove Part (ii) we assume that each of the events Wi , W2 , • • • occurs with 
probability one and apply Part (i) to the sets Vi , V2, • • •, where Vj is the set- 
complement of Wj, i.e., Vj = Cl\ yVj: 



Pr(flW J )=l-Prf(fl W . 

3 

= l-Prf|JV 



3 
3 



= 1, 



where the first equality follows because the probabilities of an event and its com- 
plement sum to one; the second because the complement of an intersection is the 
union of the complements; and the final equality follows from Part (i) because 
the events Wj are, by assumption, of probability one so their complements are of 
probability zero. □ 

21.5.1 Applications to Hypothesis Testing 

We shall now use the Union Bound to derive an upper bound on the conditional 
probability of error Pmap (error \M = m) of the MAP decoding rule. The bound 
we derive is applicable to any decision rule that satisfies the hypothesis of Theo- 
rem 21.3.1 as expressed in (21.16). 

Define for every ml 7^ m the set B m ^ m i C WL d by 

B m ,m' = |y G E d : 7r m , / Y |M=m'(y) > Km / Y |M=m(y)} ■ (21.33) 

Notice that y € i3 TO , m ' does not imply that the MAP rule will guess ml: there may 
be a third hypothesis that is a posteriori even more likely than either m or ml '. 
Also, since the inequality in (21.33) is not strict, y G T3 m ,m' does not imply that 
the MAP rule will not guess m: there may be a tie, which may be resolved in favor 
of m. As we next argue, what is true is that if m was not guessed by the MAP rule, 
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then some m! which is not equal to m must have had an a posteriori probability 
that is at least as high as that of m: 



not guessed j => ( Y € [J & m , m > 

m' ^m 



(21.34) 



Indeed, if m was not guessed by the MAP rule, then some other message was. 
Denoting that other message by m' , we note that 7r m ' fy\M=m'(y) m ust be at least 
as large as ir m /Y|M=m(y) (because otherwise m' would not have been guessed), 

so y G B m .rn>- 

Continuing from (21.34), we note that if the occurrence of an event £\ implies the 
occurrence of an event £ 2 , then Pt(£i) < Pr(£ 2 )- Consequently, by (21.34), 



Ye |J B m , m > 



M = m 



Pmap (error |M = m) < Pr 

= Prf |J {wefi:Y(w)e6 m ,m'} 



M = m 



< ]T Pr (V e n : Y(w) e B mtm ,} | M 

= ]T Pr[YG6 m , m -|M=m] 



E 



/Y|M=m(y)dy- 



We have thus derived: 



Proposition 21.5.3. For the setup of Theorem 21.3.1 let pMAp(error|M = m) 
denote the conditional probability of error conditional on M = m of the MAP rule 
for guessing M based on Y. Then, 



Pmap (error |M = m) < y, Pr[Y g B m ^ m i | M = m] 

m' ^m 



E 

m' ^m 



/■ 



Y|M=m 



(y)dy, 



where 



B 



m,m' 



|y S M. d : lT m > / Y |M=m'(y) > ^m /Y|M=m(y)} 



(21.35) 
(21.36) 

(21.37) 



This bound is applicable to any decision rule satisfying the hypothesis of Theo- 
rem 21.3.1 as expressed in (21.16). 

The term Pr(Y € B m _ m > \ M = m) has an interesting interpretation. If ties occur 
with probability zero, then it corresponds to the conditional probability of error 
(given that M = m) incurred by a MAP decoder designed for the binary hypothesis 
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Figure 21.7: Error events for 8-PSK conditional on M = 4. 

testing problem of guessing whether M = m or M = ml when the prior probability 
that M = m is 7r TO /(7r TO + n m r) and that M = ml is 7r m //(7r TO + 7r m /). 

Alternatively, we can write (21.35) as 




(21.38) 



21.5.2 Example: The Union Bound for 8-PSK 

We next apply the Union Bound to upper-bound the probability of error associated 
with maximum-likelihood decoding of 8-PSK. For concreteness we focus on the 
conditional probability of error, conditional on M = 4. We shall see that in this 
case the RHS of (21.35) is still an upper bound on the probability of error even if 
we do not sum over all ml that differ from m. Indeed, as we next argue, in upper- 
bounding the conditional probability of error of the ML decoder given M = 4, it 
suffices to sum over ml £ {3, 5} only. 

To show this we first note that for this problem the set B m _ m i of (21.33) corresponds 



to the set of vectors that are at least as close to (a m <, b m ' 



as to (a m ,b m ) 



B n 



ye 



:(V 



(1) 



+ {v 



(2) 



(y 



(i) 



+ (y 



(2) 



n 



As seen in Figure 21.7, given M = 4, an error will occur only if the observed 
vector Y is at least as close to (03,63) as to (04,64), or if it is at least as close 
to (05,65) as to (04,64). Thus, conditional on M = 4, an error can occur only if 

Y £ $43 U B4 5. (If Y ^ B43 U B45, then an error will certainly not occur. If 

Y £ $4 3 U $4,5, then an error may or may not occur. It will not occur in the case 
of a tie — corresponding to Y being on the boundary of $4,3 U $4,5 — provided that 
the tie is resolved in favor of M = 4.) 

Note that the events Y £ 64.5 and Y £ B4.3 are not mutually exclusive, but, 
nevertheless, by the Union-of-Events Bound 



p M Ap(error|M = 4) < Pr[Y £ £4,3 U £ 4 , 5 | M = 4] 

< Pr[Y £ B 4 , 3 I M = 4] + Pr[Y e £4,5 I M 



4], (21.39) 
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where the first inequality follows because, conditional on M = 4, an error can 
occur only if y € B^s U #4,5; and where the second inequality follows from the 
Union-of-Events Bound. In fact, the first inequality holds with equality because, 
for this problem, the probability of a tie is zero; see Proposition 21.6.2 ahead. 

From our analysis of multi-dimensional binary hypothesis testing (Lemma 20.14.1) 
we obtain that 



Pr[Ye£ 4 , 3 |M = 4] = Q 



^(a 4 - a 3 ) 2 + (64 - 6 3 ) 2 



2(7 



a(Xf) I 



and 



Pr[YeB 4 ,5|M = 4] = Q 



^(q 4 - a 5 ) 2 + (64 - 6^p 

2(7 

A 



-C(i-.(|)V (21.41) 

Combining (21.39), (21.40), and (21.41) we obtain 

p M Ap(error|M = 4)<2Q^sin(^)j. (21.42) 

This is only an upper bound and not the exact error probability because the sets 
B^s and B^ are not disjoint so the events Y € $43 and Y € $4,5 are not disjoint 
and the Union-Bound is not tight; see Figure 21.7. 

For this symmetric problem the conditional probability of error conditional on 
M = m does not depend on the message ro, and we thus also have by (21.18) 

p* (error) < 2Q( — sin f- H. (21.43) 



21.5.3 Union-Bhattacharyya Bound 

We next derive a bound which is looser than the Union Bound but which is of- 
ten easier to evaluate in non-Gaussian settings. It is the multi-hypothesis testing 
version of the Bhattacharyya Bound (20.50). 

Recall that, by Theorem 21.3.1, any guessing rule whose guess after observing that 
Y = y bs is in the set 

A^(yobs) = \m e M : n^ / Y |M=m(yobs) = max |tiw /Y|M=m'(yobs)}} 

is optimal. To analyze the optimal probability of error p* (error) , we shall analyze 
one particular optimal decision rule. This rule is not the MAP rule, but it differs 
from the MAP rule only in the way it resolves ties. Rather than resolving ties at 
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random, this rule resolves ties according to the index of the hypothesis: it chooses 
the message in -M(y bs) of smallest index. For example, if the messages of highest 
a posteriori probability are Messages 7, 9, and 17, i.e., if *M(y bs) = {7, 9, 17}, then 
it guesses "7." This decision rule may not appeal to the reader's sense of fairness 
but, by Theorem 21.3.1, it is nonetheless optimal. Consequently, if we denote the 
conditional probability of error of this decoder by p(error|M = m), then 



P 



(error) = > 7r m p(error|M 



(21.44) 



meM 



We next analyze the performance of this decision rule. For every m! ^ in let 
{y 6 R d : TT m , / Y |M=m'(y) > ^m / Y |M=m(y)} if m ' < 



m. 



v mm , = \ y ---■■'» "i«= ra ' w ' -■■»wxi«=mw/ j -■•• — . (2145) 

[{y GM d : 7r m -/ Y | M=m ,(y) > 7r m / Y | M=m (y)| if m >" 



■m. 



Notice that 



V„ 



Vm'mi m ¥= m '■ 



(21.46) 



Conditional on M = m, our detector will err if, and only if, y b s € U m '^ m 2? mjT 
Thus 



Pr(error | M = m) = Pr 



M = m 



YG |J P ro , 

= Pv( |J {we(l:Y(w)ev} 

777/7^771 

= 51 Pr[YeP m , m < | M = m] 

= £ 



M = m 
M = m 



/v|M=m(y)dy, 



(21.47) 



where the inequality follows from the Union Bound. To upper-bound p* (error) we 
use (21.44) and (21.47) to obtain 



M 



p* (error) = 2_. n m Pr(error | M = m) 

m—l 
M . 

- E 7Fm E / /Y|M=m(y)dy 



£ £ l 71 "™/- /Y|M=m(y)dy + 7r„ 



M 



m—l m'~>m 

M 



~D„ 



IY\M= 



'(y)dy 



E E I 7I "m/Y|M=m(y)dy+ / ^ 7T m , / Y |M=m' (y) d y J 

7l=lm'>m^ i-'m.m' m,m' 
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M . 

= Yl E / min { 7r 'n/Y|M=m(y). 7r m'/Y|M=m'(y)} d y 
m=l m'>m 
M __ 

^E E V^mTTm' / \/ /Y|M=m(y)/Y|M=m' (y) d Y 
m=lm'>m Rd 

M __ . 

< Y] Y] m m / \//Y|M=m(y)/Y|M=m'(y)dy 

m— 1 m'>m 

= O 5Z E "' o / \//Y|M=m(y)/Y|M=m'(y)dy, 



where the equality in the first line follows from (21.44); the inequality in the second 
line from (21.47); the equality in the third line by rearranging the sum; the equality 
in the fourth line from (21.46); the equality in the fifth line from the definition of 
the set 2? m>m '; the inequality in the sixth line from the inequality min{a, b} < vab, 
which holds for all nonnegative a, b G K (see (20.48)); the inequality in the seventh 
line from the Arithmetic-Geometric Inequality \/cd < (c + d)/2, which holds for 
all c, d > (see (20.49)); and the final equality by the symmetry of the summand. 
We have thus obtained the Union-Bhattacharyya Bound: 

p*(error) < ^ E ~ 4^' / \/ fv\M=m(y)fv\M=m>(y) dy. (21.48) 
For a priori equally likely hypotheses it takes the form 



p* (error) <— ^ ^ / ^ / Y |M=m(y)/Y|M=m'(y) dy, 



mGA^ m'^m 



(21.49) 



which is the Union-Bhattacharyya Bound for M-ary hypothesis testing with a 
uniform prior. 

21.6 Multi-Dimensional M-ary Gaussian Hypothesis Testing 

We next use Theorem 21.3.3 to study the multi- hypothesis testing version of the 
problem we addressed in Section 20.14. We begin with the problem setup and then 
proceed to derive the MAP decision rule. We then assess the performance of this 
rule by deriving an upper bound and a lower bound on its probability of error. 



21.6.1 Problem Setup 

A random variable M takes value in the set M. = { 1 , . . . , M} with a nondegenerate 
prior (21.4). We wish to guess M based on an observation consisting of a random 
column-vector Y taking value in MJ whose components are given by Y^ l \ . . . , Y^> 7 



7 Our observation now takes value in IR' and not as before in R d . My excuse for using J instead 
of d is that later, when we refer to this section, d will have a different meaning and choosing J 
here reduces the chance of confusion later on. 
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For typographical reasons we denote the observed realization of Y by y, instead 
of y bs- For every m 6 M we have that, conditional on M = m, the components 
of Y are independent Gaussians, with Y^' ~ Af{sm , & 2 ), where s m is some de- 
terministic vector of ] components Sm , ■ ■ ■ , s m , and where a 2 > 0. Recalling the 
density of the univariate Gaussian distribution (19.6) and using the conditional in- 
dependence of the components of Y given M = m, we can express the conditional 
density /v|M=m(y) of the vector Y at every point y = (y^ ', ■ ■ ■ , 2T ) hi K' as 

/v,„„„(y) = ^(^^(-i^l!)). (21.50) 

21.6.2 Optimal Guessing Rule 

Using Theorem 21.3.3 we obtain that, having observed y = (y^ 1 ' , . . . , y^') T € K', 
an optimal decision rule is the MAP rule, which picks uniformly at random an 
element from the set 

M{y) = lih e M : n^ / Y |M=m(y) = ^fj^l 71 "™' h\M=m>{y)\ \ 

= Im e M : ln(7r A / Y | M =m(y)) = ^t^JM 71 "™' /Y|M=m'(y))} | i (21.51) 

where the second equality follows from the strict monotonicity of the logarithm. 
We next obtain a more explicit description of M{y) for our setup. By (21.50), 



T 1 ] 

ln(7r ro / Y | M=ro (y)) = ln7r m - ^ln(2™ 2 ) - _£(j,(i) _ S W)) 2 . (2 i. 5 2) 

The term (J/2) ln(27r<7 2 ) is a constant term that does not depend on the hypothesis. 
Consequently, it does not influence the set of messages that attain the highest score. 
(The tallest student in the class is the same irrespective of whether the height of all 
the students is measured when they are barefoot or when they are all wearing the 
one-inch heel school uniform shoes. The heel can only make a difference if different 
students wear shoes of different heel height.) Thus, 

f ' (yU) _ S (J)) 2 ( J (yU) _ s 0')) 2 

M(y) = I rheM: ln^-V K " „ 2 TO ' = max ln^ m ,-V {y ™' > 

*■ — ' 2<r z m'eM * — ' 2er z 

The expression for M(y) can be further simplified if M is a priori uniformly 
distributed. In this case we have 

f > (yU) _ S W)) 2 f J (yU) _ S W)) 2 

M(y) = < m e M : - } — -~ = max < — > r— 5 



G M : Xl(y (j) " «a ) 2 = min { X)(j/ 0) " S H') 2 ) f . M uniform, 
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where the first equality follows because when M is uniform the additive term In ir rn 
is given by ln(l/M) and hence does not depend on the hypothesis; and where the 
second equality follows because changing the sign of all the elements of a set changes 
the largest ones to the smallest ones, and by noting that scaling the score by 2<r 2 
does not change the highest scoring messages (because we assumed that a 2 > 0). 

If we interpret the quantity 



as the Euclidean distance between the vector y and the vector s m , then we see that, 
for a uniform prior on M, it is optimal to guess the message m whose corresponding 
mean vector s m is closest to the observed vector y. Notice that to implement this 
"nearest- neighbor" decision rule we do not need to know the value of a 2 . 

We next show that if, in addition to assuming a uniform prior on M, we also 
assume that the vectors s 1: . . . , Sm all have the same norm, i.e., 

||si|| = ||s 2 || = ..- = ||sm||, (21-53) 

then 



M{y) = < m e M : } ,V s^, = max \ / t V s 

{ j=i m eM *■ j=i J J 

so the MAP decision rules guesses the message m whose mean vector s m has 
the "highest correlation" with the received vector y. To see this, we note that 
because M has a uniform prior the "nearest- neighbor" decoding rule is optimal, 
and we then expand 



J 



2 = j2( v w- 8 y)\ 2 



.3=1 i=i i=i 

where the first term does not depend on the hypothesis and where, by (21.53), the 
third term also does not depend on the hypothesis. 

We summarize our findings in the following proposition. 

Proposition 21.6.1. Consider the problem described in Section 21.6.1 of guess- 
ing M based on the observation y. 

(i) It is optimal to form the guess based on y = (2/ 1 ', . . . ,y^') T by choosing 
uniformly at random from the set 

,Ci)_.«0'V r J- („(j)_ s W) 2 i ) 



m G M : In 7r A - } — -^ — = max <^ In 7r m / - ) 

i—* 2o~ z m'EM [ ^ 



.. v la 2 

3=1 K i=i 
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(ii) If M is uniformly distributed, then this rule is equivalent to the "nearest- 
neighbor" decoding rule of picking uniformly at random an element of the 
set 

ImeM : ||y-s A || = min {||y-s m /||}|. 

(Hi) If, in addition to M being uniform, we also assume that the mean vectors 
satisfy (21.53), then this rule is equivalent to the "maximum- correlation" rule 
of picking at random an element of the set 

{to e M : J>W) a O) = max { £>">$} J. 

We next show that if the mean vectors si, . . . , sjvi are distinct in the sense that for 
every pair to' 7^ to" in M there exists at least one component where the vectors 
s m / and s TO " differ, i.e., 

||s m ' - s m »|| > 0, to' 7^ m", 

then the probability of ties is zero. That is, we will show that the probability of 
observing a vector y for which the set M(y) (21.51) has more than one element 
is zero. Stated in yet another way, the probability that the observable Y will be 
such that the MAP will require randomization is zero. Stated one last time: 

Proposition 21.6.2. // the mean vectors S\, . . . , Sjvi in our setup are distinct, then 
with probability one the observed vector y is such that there is a unique message of 
highest a posteriori probability. 

Proof. Conditional on Y = y, associate with each message m £ M. the score 
ln(7r m fy\M=m(y))- We need to show that the probability of the observation y 
being such that at least two messages attain the highest score is zero. Instead, we 
shall prove the stronger statement that the probability of two messages attaining 
the same score (be it maximal or not) is zero. 

We first show that it suffices to prove that for every to € M. and for every pair of 
messages to' 7^ to", we have that, conditional on M = to, the probability that to' 
and to" attain the same score is zero, i.e., 

Pr(score of Message to' = score of Message to" M = to) =0, to' 7^ to". 

(21.54) 
Indeed, once we show (21.54), it will follow that the imconditional probability that 
Message to' attains the same score as Message to" is zero, i.e., 

Pr (score of Message to' = score of Message to") =0, to' 7^ to", (21.55) 

because 

Pr (score of Message to' = score of Message to") 

= / I'm Pr (score of Message to' = score of Message to" M = to) . 

m£M 
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But (21.55) implies that the probability that any two or more messages attain the 
highest score is zero because 

Pr(two or more messages attain the highest score) 



Pr M {to/ and m" attain the highest score} 



m' ^ra 



< 2. Pr(m' and m" attain the highest score) 



rri ' ^m" 

< y, P r \ m ' and m " attain the same score) , 

m' ,m" £J\4 
rri ' ^ra" 

where the first equality follows because more than one message attains the high- 
est score if, and only if, there exist two distinct messages ml and to" that attain 
the highest score; the subsequent inequality follows from the Union Bound (Theo- 
rem 21.5.1); and the final inequality by noting that if ml and to" both attain the 
highest score, then they both achieve the same score. 

Having established that in order to complete the proof it suffices to establish 
(21.54), we proceed to do so. By (21.52) we obtain, upon opening the square, 
that the observation Y results in Messages ml and m" obtaining the same score if, 
and only if, 

^E^'H^-^^ln^ + ^dls^f-ll^ll 2 ). (21-56) 

3 = 1 

We next show that, conditional on M = m, the probability that Y satisfies (21.56) 
is zero. To that end we note that, conditional on M = to, the random variables 
Y^ \ . . . , y'" are independent random variables with Y^' being Gaussian with 
mean Sm and variance a 2 ; see (21.50). Consequently, by Proposition 19.7.3, we 
have that, conditional on M = to, the LHS of (21.56) is a Gaussian random variable 
of variance 

2 ll s m' — s m" || > 

which is positive because ml ^ to" and because we assumed that the mean vectors 
are distinct. It follows that, conditional on M = to, the LHS of (21.56) is a 
Gaussian random variable of positive variance, and hence has zero probability of 
being equal to the deterministic number on the RHS of (21.56). This proves (21.54), 
and hence concludes the proof. □ 



21.6.3 The Union Bound 

We next use the Union Bound to upper-bound the optimal probability of error 
p*(error). By (21.38) 

PMAp(error|M = to) < ^ Prf 7 *"™' /Y|M=m'( Y ) > 7r m/ Y |M=m( Y ) | M = to] 
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= £ 2^ + ^ln^, (21.57) 

^-f \ 2<t s m - s m / n m >J 

where the equality follows from Lemma 20.14.1. From this and from the optimality 
of the MAP rule (21.18) we thus obtain 

If M is uniform, these bounds simplify to: 

p M Ap(error|M = m) < £ g( " Sm ~ Sm '" Y M uniform, (21.59) 



m'^m 



(error) <i-E £q(^— ^Y M uniform. (21.60) 



M ^ ^ V 2cr 

21.6.4 A Lower Bound 

We next derive a lower bound on the optimal error probability p* (error). We do so 
by lower-bounding the conditional probability of error Pmap (error \M = m) of the 
MAP rule and by then using this lower bound to derive a lower bound on p* (error) 
via (21.18). 

We note that if Message m! attains a score that is strictly higher than the one 
attained by Message m, then the MAP decoder will surely not guess "M = m." 
(The MAP may or may not guess "M = m'" depending on the score associated 
with messages other than m and m'.) Thus, for each message m! ^ m we have 

PMAp(error|M = m) > Pr[7T m / / Y |M=m'( Y ) > ^m /Y|M=m( Y ) \ M = m] (21.61) 

= Q / ||s m -s m ,|| a ln I^Y (21.62) 

V 2<r ||s m -s m /|| n m > / 

where the equality follows from Lemma 20.14.1. 

Noting that (21.62) holds for all m! ^ m, we can choose ml to get the tightest 
bound. This yields the lower bound 

PMAp(error|M = m) > max Q — ^— — — \- n- In —^- ) (21.63) 

m'€M\{m} \ 2(7 \\S m - S TO / || TT m > J 

and hence, by (21.18), 

p* (error) > V „ m max Q ( l|Sm ~ S? "' 11 +- - In ^ Y (21.64) 

^—T m'SA4\{m} \ 2f7 S m - S m > 7T m / / 

For uniform M this expression can be simplified by noting that the Q-function is 
strictly decreasing: 

PMAp(error|M = m) > Q min — — — — ), M uniform, (21.65) 

\m'GM\{m} 2(7 / 
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p*(error) > — > Q min — — — — ), M uniform. (21.66) 



M ^— ' \m>£M\tm} 2(7 

meM ' ' 



21.7 Additional Reading 

For additional reading on multi-hypothesis testing see the recommended reading 
for Chapter 20. The problem of assessing the optimal probability of error for the 
multi-dimensional M-ary Gaussian hypothesis testing problem of Section 21.6 has 
received extensive attention in the coding literature. For a survey of these results 
see (Sason and Shamai, 2006). 

21.8 Exercises 

Exercise 21.1 (Ternary Gaussian Detection). Consider the following special case of the 
problem discussed in Section 21.6. Here M is uniformly distributed over the set {1,2,3}, 
and the mean vectors si , S2 , S3 are given by 

si =0, s 2 = s, s 3 = — s, 

where s is some deterministic nonzero vector in M J . Find the conditional probability of 
error of the MAP rule conditional on each hypothesis. 

Exercise 21.2 (4-PSK Detection). Consider the setup of Section 21.4 with M = 4 and 
(oi,6i) = (0»A), (03,62) = (-A,0), (03,63) = (0, -A), (o 4 ,6 4 ) = (A, 0). 

(i) Sketch the decision regions of the MAP decision rule. 

(ii) Using the Q-function, express the conditional probabilities of error of this rule 
conditional on each hypothesis. 

(iii) Compute an upper bound on pMAp(error|M — 1) using Propsition 21.5.3. Indicate 
on the figure which events are summed two or three times. Can you improve the 
bound by summing only over a subset of the alternative hypotheses? 

Hint: In Part (ii) first find the probability of correct detection. 

Exercise 21.3 (A 7-ary QAM problem). Consider the problem addressed in Section 21.4 
in the special case where M = 7 and where 

/2-7rm\ . /2nm\ 

a m — Acosl I, 6 m = Asinl— — I, m = l,...,6, 

a 7 = 0, 67 = 0. 

(i) Illustrate the decision regions of the MAP (nearest-neighbor) guessing rule. 

(ii) Let Z = (Z^ 1 ' , Z^ 2 ') T be a random vector whose components are IID A/"(0,cr 2 ). 
Show that for every message m £ {1, . . . , 7} the conditional probability of error 
£>MAp(error| M — m) can be upper-bounded by the probability that the Euclidean 
norm of Z exceeds A/2. Calculate this probability. 
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(iii) What is the upper bound on pMAp(error|M = m) that Proposition 21.5.3 yields in 
this case? Can you improve it by including fewer terms? 

(iv) Compare the different bounds. 
See (Viterbi and Omura, 1979, Chapter 2, Problem 2.2). 

Exercise 21.4 (Orthogonal Mean Vectors). Let M be uniformly distributed over the set 
M — {1, . . . , M}. Let the observable Y be a random J-vector. Conditional on M — m, 
the observable Y is given by 

y = VT s 4> m + z, 

where Z is a random J-vector whose components are IID jV(0, a ) , and where <pi, . . . ,cj>m 
are orthonormal in the sense that 

(4> m ' , 4> m ") E = l{m — m }, m ,m" £ M. 

Show that 

p M Ap(error|M = m) = l--^= f°° (l - Q(fl) M_1 e -1 ^ d£, (21.67) 



where a = vEs/""- 

Exercise 21.5 (Equi-Energy Constellations). Consider the setup of Section 21.6.1 with a 
uniform prior and with ||si|| — ■ ■ ■ — ||sm || = Ls- Show that the optimal probability of 
correct decoding is given by 



p* (correct) = — exp (-^)e 



exp ( — max (V, s m ) E 



(21.68) 



where V is a random J-vector whose components are IID 7V(0,<t 2 ). We recommend the 
following approach. Let Pi, ... , T>m be a partition of M J such that for every m € M, 

y € T> m =► (y, s m ) E = max (y, s m /) E . 

in' 

(i) Show that 



p* (correct) = — ^ Pr[Y € T> m \ M = ■ 



M 
(ii) Show that the RHS of the above can be written as 

L s 



M 6XP V W 



^W^ eXP ("i^)( ^ I{yeP m }exp(— <y,s m ) E )jdy 



/rJ (2tto- 2 ) 
(iii) Finally show that 

Y^ Hy € v m} exp ( — (y, s m ) E J = exp ( — max (y, s m ) E J , y£l' 



See also Problem 23.7. 
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Exercise 21.6 (When Is the Union Bound Tight?). Under what conditions on the events 
Vi, V 2 , ... is the Union Bound (21.32) tight? 

Exercise 21.7 (The Union of Independent Events). Show that if the events Vi, V2, • • • , V n 

are independent then 

r n \ n 

Pr( \JvA =l-n(l-Pr(V,)). 

Exercise 21.8 (A Lower Bound on the Probability of a Union). Show that the probability 
of the union of n events Vi , . . . , V n can be lower-bounded by 

/ n \ " n-1 n 

Pr ( U v * ) ^ E Pr (v,) - E E Pr (v, n v,). 

S = l ' j = l 3 = 11=3 + 1 

Inequalities of this nature are sometimes called Bonferroni Inequalities. 

Exercise 21.9 (de Caen's Inequality). Let X be a RV taking value in the finite set X , 
and let {Ai}i^x be a finite family of subsets (not necessarily disjoint) of X: 

Ai C X, i€l. 

Define 

Pr(A) = Pr[X € Ai], iei, 

deg(:r) = #{iel:i£ A}, a; € X, 
where #B denotes the cardinality of a set B. 

(i) Show that 

Pr[X = x] 



MU^HEE 



deg(x) 
(ii) Use the Cauchy-Schwarz Inequality to show that for every i£l, 

£ P £(ir) ( £ Pr[x = x] dee{x) ) - ( £ Pr[x = * ] 

(iii) Use Parts (i) and (ii) to show that 

:*r 2 



2 



p<UA),E E ^r ,prlx ; t | , . 

(iv) Conclude that 

This is de Caen's Bound (de Caen, 1997). 

Exercise 21.10 (Asymptotic Tightness of the Union Bound). Consider the hypothesis 
testing problem of Section 21.6 when the prior is uniform and the mean vectors si, . . . , sm 
are distinct. Show that the Union Bound of (21.59) is asymptotically tight in the sense 
that the limiting ratio of the RHS of (21.59) to the LHS tends to one as a tends to zero. 

Hint: Use Exercise 21.8. 



Chapter 22 

Sufficient Statistics 

22.1 Introduction 

In layman's terms, a sufficient statistic for guessing M based on the observable Y 
is a random variable or a collection of random variables that contains all the infor- 
mation in Y that is relevant for guessing M. This is a particularly useful concept 
when the sufficient statistic is more concise than the observables. For example, if 
we observe the results of a thousand coin tosses Y\ , . . . , Yiooo and we wish to test 
whether the coin is fair or has a bias of 1/4, then a sufficient statistic turns out 
to be the number of "heads" among the outcomes Y\, . . . , Yiooo- 1 Another example 
was encountered in Section 20.12. There the observable was a two-dimensional 
random vector, and the sufficient statistic summarized the information that was 
relevant for guessing H in a scalar random variable; see (20.69). 

In this chapter we provide a formal definition of sufficient statistics in the multi- 
hypothesis setting and explore the concept in some detail. We shall see that our 
definition is compatible with Definition 20.12.2, which we gave for the binary case. 
We only address the case where the observations take value in the <i-dimensional 
Euclidean space WL d . Extensions to observations consisting of a stochastic process 
are discussed in Section 26.3. Also, we only treat the case of guessing among a 
finite number of alternatives. We thus consider a finite set of messages 

M = {1,...,M}, (22.1) 

where M > 2, and we assume that associated with each message m € M. is a density 
/Y|M=m(0 on R d > i- e -j a nonnegative Borel measurable function that integrates to 
one. 

The concept of sufficient statistics is defined for the family of densities 

/ Y |M=m(-), meM; (22.2) 

it is unrelated to a prior. But when we wish to use it in the context of hypothesis 
testing we need to introduce a probabilistic setting. If, in addition to the family 



1 Testing whether a coin is fair or not is a more complicated hypothesis testing problem of a 
kind that we shall not address. It falls under the category of "composite hypothesis testing." 
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{/Y|M=m(")}meX) we introduce a prior {7T TO } me x, then we can discuss the pair 
(M, Y), where Pr[M = m] = 7r m , and where, conditionally on M = m, the dis- 
tribution of Y is of density /Y|M=m(0- Thus, once we have introduced a prior 
{""mlroeA-f we can i f° r example, discuss the density /y(') of Y as in (21.11) 

My)= E ^/ Y |M= m (y), ye^ d , (22.3) 

and the conditional distribution of M conditional on Y = y as in (21.12) 

' ^m /Y|M=m(y) 



Pr[M = m|Y = y] = < 



if/ Y (y)>0, 

/Y(y) meM, y eR d . 

— otherwise, 

M 

(22.4) 



22.2 Definition and Main Consequence 

In this section we shall define sufficient statistics for a family of densities (22.2). 
We shall then state the main result about this notion, namely, that there is no loss 
in optimality in basing one's guess on a sufficient statistic. 

Very roughly, T(-) (or sometimes T(Y)) forms a sufficient statistic for guessing M 
based on Y if there exists a black box that, when fed T(y b s ) (but not y bs) and 
any prior {7r m } on M produces the a posteriori distribution of M given Y = y b s - 

For technical reasons we make two exceptions. While the black box must always 
produce a probability vector, we only require that this vector be the a posteriori 
distribution of M given Y = y b s for observations y b s that satisfy 

X) 7r m/ Y |M=m(yob s ) >0 (22.5) 

meM 

and that lie outside some prespecified set 3^ C M. d of Lebesgue measure zero. Thus, 
if Yobs is in .Vo ° r if (22.5) is violated, then the output of the black box can be any 
probability vector. The exception set ^o is not allowed to depend on {7r TO }. Since 
it is of Lebesgue measure zero, the conditional probability that the observation Y 
lies in J^o is zero: 

Pr[Y e^o \M = m] = 0, meM. (22.6) 

Note that the black box need not indicate whether y b s is in 3^o and/or whether 
(22.5) holds. Figure 22.1 depicts such a black box. 

Definition 22.2.1 (Sufficient Statistics for M Densities). We say that a mapping 
T : M. d — > WL d forms a sufficient statistic for the densities /yim=i("); • • • ; /yim=m.(") 
on S. d if it is Borel measurable and if for some 3^o C K d of Lebesgue measure zero we 
have that for every prior {7r m } there exist M Borel measurable functions from M. d 
to [0, 1] 

^(yobs) i-» ^m({7Tm},T(y obs )), meM, 
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Figure 22.1: A black box that when fed any prior {7r m } and T(y b s ) (but 
not the observation y b s directly) produces a probability vector that is equal to 
(Pr[M = 1 1 Y = y bs], • . . ,Pr[M = M| Y = y b s ]) T whenever both the condition 
Y. m eM n ™ /y|M=m(yobs) > and the condition y obs 4- ^o are satisfied. 



such that the vector 

i>i({ir m },T{y ohs )), . . . ,il> M ({ir m },T(y ohs )) 

is a probability vector and such that this probability vector is equal to 

Pr[M = 1 1 Y = y obs ], . . . ,Pr[M = M| Y = y obs ] 

whenever both the condition y bs ^ 3^o and the condition 

M 



X] 7Tm /Y I M=m (yobs) > 



(22.7) 



(22.8) 



m— 1 



are satisfied. Here (22.7) is computed for M having the prior {7r m } and for the 
conditional law of Y given M corresponding to the given densities. 

The main result regarding sufficient statistics is that if T(-) forms a sufficient 
statistic, then — even if the transformation T(-) is not reversible — there is no loss 
in optimality in basing one's guess on T(Y). 

Proposition 22.2.2 (Guessing Based on T(Y) Is Optimal). If T: R d -> R d ' 

is a sufficient statistic for the M densities {f-y\M=m{')}meM, then, given any 
prior {ir m }, there exists an optimal decision rule that bases its decision on T(Y). 



Proof. To prove the proposition we shall exhibit a decision rule that is based 
on T(Y) and that mimics the MAP rule based on Y. Since the latter is optimal 
(Theorem 21.3.3), our proposed rule must also be optimal. Let {4> m {')} be as in 
Definition 22.2.1. Given Y = y bs, the proposed decoder considers the set of all 
messages in satisfying 



VVh({7r m },T(y ob s)) = max ^m'({^m},T(y ohs )) 

m'EM 



(22.9) 



and picks uniformly at random from this set. 
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We next argue that this decision rule is optimal. To that end we shall show that, 
with probability one, this guessing rule is the same as the MAP rule for guessing M 
based on Y. Indeed, the guess produced by this rule is identical to the one produced 
by the MAP rule whenever y b s satisfies (22.8) and lies outside 3V Since the 
probability that Y satisfies (22.8) is, by (21.13), one, and since the probability 
that Y is outside J/'o is, by (22.6), also one, it follows from Corollary 21.5.2 that 
the probability that Y satisfies both (22.8) and the condition Y ^ 34) is also one. 
Thus, the proposed guessing rule, which bases its decision only on T(y b s ) and 
on the prior has the same performance as the (optimal) MAP decision rule for 
guessing M based on Y. □ 



22.3 Equivalent Conditions 

In this section we derive a number of important equivalent definitions for sufficient 
statistics. These will further clarify the concept and will also be useful in identifying 
sufficient statistics. We shall try to state the theorems rigorously, but our proofs 
will be mostly heuristic. Rigorous proofs require some Measure Theory that we 
do not wish to assume. For a rigorous measure-theoretic treatment of this topic 
see (Halmos and Savage, 1949), (Lehmann and Romano, 2005, Section 2.6), or 
(Billingsley, 1995, Section 34). 2 

22.3.1 The Factorization Theorem 

The following characterization is useful because it is purely algebraic. It explores 
the form that the densities {/y|m=»ti(')} m ust have for T(Y) to form a sufficient 
statistic. Roughly speaking, T(-) is sufficient if the densities in the family all have 
the form of a product of two functions, where the first function depends on the 
message and on T(y), and where the second function does not depend on the 
message but may depend on y. We allow, however, an exception set 3^0 C K d of 
Lebesgue measure zero, so we only require that for every m G M 

f-Y\M=m(y)=9m(T(y))h(y), y£y - (22.10) 

Note that if such a factorization exists, then it also exists with the additional 
requirement that the functions be nonnegative. Indeed, if (22.10) holds, then by 
the nonnegativity of the densities 

/Y|M=m(y) = |/Y|M=m(y)| 

= \9m(T(y)) h(y)\, y£y a 
= \g m (T(y))\\h{y)\, y £ y , 

thus yielding a factorization with the nonnegative functions 

{y^\9m(T{y))\} m( _ M and y ^ \h(y)\. 



2 Our setting is technically easier because we only consider the case where M is finite and 
because we restrict the observation space to BJ d . 
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Limiting ourselves to nonnegative factorizations, as we henceforth shall, is helpful 
in manipulating inequalities where multiplication by negative numbers requires 
changing the direction of the inequality. For our setting the Factorization Theorem 
can be stated as follows. 3 

Theorem 22.3.1 (The Factorization Theorem). A Borel measurable function 
T: M. d — > M. d forms a sufficient statistic for the M densities {fy\M=m(')}meM 
on M. d if and only if there exists a set [Vo C R d of Lebesgue measure zero and non- 



oo 



negative Borel measurable functions gi, . . . , gM : K — > [0, oo) and h : K — > [0, 
such that for every m G M. 

fr\M=m(y)=9m(T(y))h(y), yeR d \y - (22.11) 

Proof. We begin by showing that if T(-) is a sufficient statistic then there exists a 
factorization of the form (22.11). Let the set 3^o an d the functions {fpm{')} be as in 
Definition 22.2.1. Pick some jt\, . . . , 7Tm > that sum to one, e.g., 7r m = 1/M for 
all m G A4, and let M be of the prior {7r m }, so Pr[M = m] = n m for all m G M. 
Let the conditional law of Y given M be as specified by the given densities so, in 
particular, 

My)= E *m/ Y |M=m(y). yeK d . (22.12) 

me.A/1 

Since {7f m } are strictly positive, it follows from (22.12) that 

(/ Y (y) = o) =>- (/ Y |M= m (y) = o, meTw). (22.13) 

(The only way the sum of nonnegative numbers can be zero is if they are all zero. 
Thus, / Y (y) = always implies that all the terms {7r m / Y |M=m(y)} are zero. But 
if {^m} are strictly positive, then this implies that all the terms {/ Y |M=m(y)} are 
zero.) 

By the definition of the functions {V'm(')} an d of the conditional probability (22.4), 
we have for every m G M. 

i (~ ~mi \\ ^ m *Y I M=m (yobs) / . \ 

Vm(7n,...,7TM,T(yob s )j = ^^ : , yobs f ya and / Y (y bs) > . 

J Y (yobs) v > 

(22.14) 

We next argue that the densities factorize as 

fY\M= m (y) = ^-4>m(Ki,...,K M ,T(y)) / Y (y), y eR d \y . (22.15) 



9m(T(y)) MY) 



3 A different, perhaps more elegant, way to state the theorem is in terms of probability dis- 
tributions. Let P m be the probability distribution on M. d corresponding to M = m, where m 
is in the finite set M. Assume that {P m } are dominated by the cr-finite measure /i. Then the 
Borel measurable mapping T: M. d — > IR'' forms a sufficient statistic for the family {P m } if, and 
only if, there exists a Borel measurable nonnegative function h(-) from K d to 1R, and M. nonneg- 
ative, Borel measurable functions g m (-) from IR d to K such that for each m £ M the function 
y H ^ 9rn(T(y)) h(y) is a version of the Radon-Nikodym derivative dP m /d/i of P m with respect 
to //,; see (Billingsley, 1995, Theorem 34.6) and (Lehmann and Romano, 2005, Corollary 2.6.1). 
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This can be argued as follows. If /V(y) is greater than zero, then (22.15) follows 
directly from (22.14). And if /y(y) is equal to zero, then RHS of (22.15) is equal 
to zero and, by (22.13), the LHS is also equal to zero. 

We next prove that if the densities factorize as in (22.11), then T(-) forms a suffi- 
cient statistic. That is, we show how using the factorization (22.11) we can design 
the desired black box. The inputs to the black box are the prior {ir m } and T(y). 
The black box considers the vector 

(tti <?i(T(y)), . . .,f M J M (T(y))) T . (22.16) 

If all its components are zero, then the black box produces the uniform distribution 
(or any other distribution of the reader's choice). Otherwise, it produces the above 
vector but normalized to sum to one. Thus, if we denote by V'm( 7r i: • ■ • > ^m, T(y)) 
the probability that the black box assigns to m when fed 7i"i, . . . , 7Tjvi and T(y), 
then 



V>m(7ri,...,7r M ,T(y)) = < 



ifE™<=i?wMT(y))=0, 



M 

n m 9 m (T(y)) 



otherwise. 



.T, m 'GM n m'9m'{T{y)) 

(22.17) 
To verify that ip m (iTi, .... 71" m., T(y)) = Pr[M = m\ Y = y] whenever y is such that 
y 4- 3^o an <i (22.8) holds, we first note that, by the factorization (22.11), 

M 

(/ Y (y) > and y £ ^o) =* (h(y) ]T tw g m , (T(y)) > o) , 



M 

/ Y (y) > and y £ y ) =► (h(y) > and ]T TT m , 9 m >(T(y)) > o) . (22.18) 



Consequently, if y £ ^o and if (22.8) holds, then by (22.18) & (22.17) 
( ip! (tti, . . . , 7T M , T(y)) , . . . , Vm(tti, • • • , tt m , T(y)) J 

is equal to the vector in (22.16) but scaled so that its components add to one. But 
the a posteriori probability vector is also a scaled version of (22.16) (scaled by 
/i(y)//v(y)) that sums to one. Thus, if y ^ 3^0 an d (22.8) holds, then the vector 
produced by the black box is identical to the a posteriori distribution vector. □ 

22.3.2 Pairwise sufficiency 

We next clarify the connection between sufficient statistics for binary hypothesis 
testing and for multi-hypothesis testing. We show that T(Y) forms a sufficient 
statistic for the family of densities {fc\M=m{')}meM i^ an d only if, for every pair 
of messages m! ^ m" in M we have that T(Y) forms a sufficient statistic for the 
densities / Y |M=m'(-) and /v|M=m"(-)- 
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One part of this statement is trivial, namely, that if T(-) is sufficient for the family 
{f~Y\M=m{')}m£M then it is also sufficient for any pair. Indeed, by the Factoriza- 
tion Theorem (Theorem 22.3.1), the sufficiency of T(-) for the family implies the 
existence of a set of Lebesgue measure zero 3^o C R d and functions {g m }meM> h 
such that for all y € R d \ y a 

fY\M= m (y) = 9m(T(y))h(y), meM. (22.19) 

In particular, if we limit ourselves to m' , m" G M then for y ^ 3^o 

/v|M=m'(y) = 9m'(T(y)) h(y), 

/v|M=m"(y) = 9m"{T(y)) h{y), 
which, by the Factorization Theorem, implies the sufficiency of T(-) for the pair of 
densities / Y |M=m'(-)> /v|M=m"(-)- 

The nontrivial part of the proposition is that pairwise sufficiency implies sufficiency. 
Even this is quite easy when the densities are all strictly positive. It is a bit more 
tricky without this assumption. 4 

Proposition 22.3.2 (Pairwise Sufficiency Implies Sufficiency). Consider M den- 
sities {fy\M=m{')}m€M on ^ d , and assume that T ': R d — ► R d forms a sufficient 
statistic for every pair of densities /Y|M=m'(")> /Y|M=m"(")> where m! ^ m" are 
both in Ad. ThenT(-) is a sufficient statistic for the M densities {/Y|M=m(")}meM- 

Proof. To prove that T(-) forms a sufficient statistic for {/Y|M=m(")}m=i we shall 
describe an algorithm (black box) that when fed any prior {7r m } and T(y b s ) (but 
not y bs) produces an M-dimensional probability vector that is equal to the a 
posteriori probability distribution vector 

Pr[M = 1 | Y = y obs ], . . . ,Pr[M = M | Y = y bs]) T (22.20) 

whenever y b s € K d is such that 

M 

yobs $ yo and ^2 7I "m/Y|M=m(yobs) > 0, (22.21) 

m— 1 

where 3^o is a subset of R d that does not depend on the prior {7r m } and that is of 
Lebesgue measure zero. 

To describe the algorithm we first use the Factorization Theorem (Theorem 22.3.1) 
to recast the proposition's hypothesis as saying that for every pair m' ^ m" in M. 
there exists a set y§ ' m C K of Lebesgue measure zero and there exist non- 
negative functions g^'' m "\gi™'' m " ) : M. d ' -> R and h( m '' m ") : R d -> R such that 

/ Y |M= m '(y) = 9%'' m "\T{y)) h^'' m "Hy), y € R d \yt' ,m "\ (22.22a) 



4 This result does not extend to the case where the random variable M can take on infinitely 
many values. 
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/ Y |M= TO «(y) = 9%'r"\T{y)) h^'' m "\y), y £ R d \y^ m '^"l (22.22b) 
Let 

y = U yt' ,m "\ (22.23) 

ra\ra" (zlM. 

and note that, being the union of a finite number of sets of Lebesgue measure zero, 
3^o is of Lebesgue measure zero. 

We now use the above functions g„™ ' m , g^?, ,m to describe the algorithm. Note 
that y b s is never fed directly to the algorithm; only T(y b s ) is used. Let the prior 

7r m = Pr[M = m], m £ M (22.24) 

be given, and assume without loss of generality that it is nondegenerate in the 
sense that 

7r m > 0, to £ M. (22.25) 

(If that is not the case, we can set the black box to produce in the coordinates 
of the output vector corresponding to messages of prior probability zero and then 
proceed to ignore such messages.) Let y b s £ R d be arbitrary. 

There are two phases to the algorithm. In the first phase the algorithm produces 
some to* £ M. whose a posteriori probability is guaranteed to be positive when- 
ever (22.21) holds. In fact, if (22.21) holds, then no message has an a posteriori 
probability higher than that of to* (but this is immaterial to us because we are 
not content with showing that from T(y b s ) we can compute the message that a 
posteriori has the highest probability; we want to be able to compute the entire 
a posteriori probability vector). In the second phase the algorithm uses to* to 
compute the desired a posteriori probability vector. 

The first phase of the algorithm runs in M steps. In Step 1 we set m[l] = 1. In 
Step 2 we set 

f 7riff^ 2) (r(y obs )) 

M2]= : n 29 ^(T(y ohs )) > ' 

[ 2 otherwise. 

And in Step v for v £ {2, . . . , M} we set 

„(m[v— 11,1/) jrnl \\ 

*r m[ „_i] 9 K m ( v _^ (r(yobs)) 
m "" 11 ' f 7. ,<-■""> (T(y„ t ,)) >2 ' <».*) 

v otherwise. 

Here we use the convention that a/0 = +00 whenever a > and that 0/0 = 1. We 
complete the first phase by setting 

to*=to[M]. (22.27) 

In the second phase we compute the vector 

r , n m 9m' m (T(y obs )) 



n m *g^ m '\T(y ohs )) 



m£M. (22.28) 



438 Sufficient Statistics 

If at least one of the components of a[-] is +00, then we produce as the algorithm's 
output the uniform distribution on M.. (The output corresponding to this case is 
immaterial because it will turn out that this case is only possible if y bs is such 
that either y obs e y or J2 m n ->nfY\M=m{yobs) = 0, in which case the algorithm's 
output is not required to be equal to the a posteriori distribution.) Otherwise, the 
algorithm's output is the vector 

( «[1] a[M] V. (22.29) 



Having described the algorithm, we now proceed to prove that it produces the 
a posteriori probability vector whenever (22.21) holds. We need to show that if 
(22.21) holds then 

Pr[M = m|Y = y obs ]= [ J to e M. (22.30) 

Since there is nothing to prove if (22.21) does not hold, we shall henceforth assume 
for the rest of the proof that it does. In this case we have by (22.4) 

„ ,., ,_ r , ^m JY|M=m(yobs) ,...,, 

Pr[M = m|Y = y obs ] = -±- , to e M. (22.31) 

/Y(,yobsJ 

We shall prove (22.30) in two steps. In the first step we show that the result to* 
of the algorithm's first phase satisfies 

Pr[M = to* I Y = y obs ] > 0. (22.32) 

To establish (22.32) we shall prove the stronger statement that 

Pr[M = m* I Y = y obs ] = max Pr[M = to I Y = y obs ] . (22.33) 

This latter statement follows from the more general claim that for any v G M. (and 
not only for v = M) we have, subject to (22.21), 

Pr[M = m[i/]|Y = y b s ] = max Pr[M = to I Y = y obs ] . (22.34) 

For v = 1, Statement (22.34) is trivial. For 2 < v < M, (22.34) follows from 

Pr[M = mM|Y = y obs ] = 

max{Pr[M = v \ Y = y obs ] ,Pr[M = m[v-l]\Y = y obs ] }, (22.35) 

which we now prove. We prove (22.35) by considering two cases separately depend- 
ing on whether Pr[M = v\ Y = y bs] and Pr[M = m\v— 1] | Y = y bs] are both zero 
or not. In the former case there is nothing to prove because (22.35) holds irrespec- 
tive of whether (22.26) results in m[v\ being set to v or to m\v — 1]. In the latter 
case we have by (22.31) and (22.25) that /viM^yobs) and / Y |M=m[^-i](yobs) 
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are not both zero. Consequently, by (22.22), in this case h*-" 1 ^ -1 *'"' (y b s ) is not 
only nonnegative but strictly positive. It follows that the choice (22.26) guarantees 
(22.35) because 

(m\v— l],v) Imi \\ 

^ m [u-A9 m [ V -i\ ( T (y°b S )) 

n v 9i mlv - 1M (T( ydbe )) 



n v gi m[ - lM {T(y obs )) h^-^\y obs ) 

_ ^mW-i]9^_-^\T(y ohs )) fe( m t- 1 ^)(yo bs )// Y (y obs ) 

7 r,^ m[,/ - 11 ' !y) (T(y obs ))/ l (™[- 1 ]^)(y obs )// Y (y obs ) 

= Pr[M = m[i/ - 1] | Y = yobs] 
Pr[M = v]\Y = y ohs ] ' 

where the first equality follows because h^ m ^~ 1 '' l/ '(y bs) is strictly positive; the 
second because in this part of the proof we are assuming (22.21); and where the 
last equality follows from (22.22) and (22.31). This establishes (22.35), which 
implies (22.34), which in turn implies (22.33), which in turn implies (22.32), and 
thus concludes the proof of the first step. 

In the second step of the proof we use (22.32) to establish (22.30). This is 
straightforward because, in view of (22.31), we have that (22.32) implies that 
/y| M=m» (Yobs) > so, by (22.22b), we have that 

/i (m ' m * ) (y O bs)>0, meM, 
3L™' m * ) (yob S )>o, meM. 

Consequently 

r , 7r m 3m ' m (T(y obs )) 



7r m ,g^ m '\T(y ohs )) 
= ir m g { rZ hm ' } (T(y ohs ))h^™'\y ohs ) 
7r m .^' m * ) (T(y obs ))/i(™^*)(y obs ) 

_ 7r m gL m - m,) (T(y obs )) fe^ m *>(yobs)// Y (yobs) 
* m * ^' m * ) (r(yobB)) ^ (m ' m * ) (yob S )// Y (yobs) 

Pr[M = m|Y = y obs ] 
Pr[M = m*|Y = y obs ]' 

from which (22.30) follows by (22.32). D 

22.3.3 Markov Condition 

We now characterize sufficient statistics using Markov chains and conditional in- 
dependence. These concepts were introduced in Section 20.11. The key result we 
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ask the reader to recall is Theorem 20.11.3. We rephrase it for our present setting 
as follows. 

Proposition 22.3.3. The statement that M^>—T(Y)^>—Y forms a Markov chain 
is equivalent to each of the following statements: 

(a) The conditional distribution of M given (T(Y), Y) is the same as given T(Y). 

(b) M and Y are conditionally independent given T(Y). 

(c) The conditional distribution of Y given \M, T(Y)J is the same as given T(Y). 

Statement (a) can also be written as: 

(a') The conditional distribution of M given Y is the same as given T(Y). 

Indeed, the conditional distribution of any random variable — in particular M — 
given (T(Y), Y) is the same as given Y only, because T(Y) carries no information 
that is not in Y. 

Statement (a') can be rephrased as saying that the conditional distribution of M 
given Y can be computed from T(Y). Since this is the key requirement of sufficient 
statistics, we obtain: 

Proposition 22.3.4. A Borel measurable function T : R d — > R d forms a sufficient 
statistic for the M densities {/Y|M=m(')}me.M if, o,nd only if, for any prior {ir m } 

M-o-T(Y)-o-Y (22.36) 

forms a Markov chain. 

Proof. The proof of this proposition is omitted. It is not difficult, but it requires 
some measure-theoretic tools. 5 □ 

Using Proposition 22.3.4 and Proposition 22.3.3 (cf. (b)) we obtain that a Borel 
measurable function T(-) forms a sufficient statistic for guessing M based on Y if, 
and only if, for any prior {7r m } on M, the message M and the observation Y are 
conditionally independent given T(Y). 

We next explore the implications of Proposition 22.3.4 and the equivalence of the 
Markovity M^>— T(Y)^— Y and Statement (c) in Proposition 22.3.3. These imply 
that a Borel measurable function T(-) forms a sufficient statistic if, and only if, the 
conditional distribution of Y given (T(Y), M = m) is the same for all m € M.. Or, 
in other words, a Borel measurable function T(-) forms a sufficient statistic if, and 
only if, the conditional distribution of Y given T(Y) does not depend on which 
of the densities in {/y|m=>ti(')} governs the law of Y. This characterization has 
interesting implications regarding the possibility of simulating observables. These 
implications are explored next. 



5 If T(-) forms a sufficient statistic, then by Definition 22. 2. f «/> m ({-7r m },T(Y)) is a version of 
the conditional probability that M = m conditional on the c-algebra generated by Y, and it is 
also measurable with respect to the cr-algebra generated by X(Y). The reverse direction follows 
from (Lehmann and Romano, 2005, Lemma 2.3.1). 
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Y(T(y obs ),9) 
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Figure 22.2: If T(Y) forms a sufficient statistic for guessing M based on Y, then — 
even though Y cannot typically be recovered from T(Y) — the performance of any 
given detector based on Y can be achieved based on T(Y) and a local random 
number generator as follows. Using T(y b s ) and local randomness 0, one produces 
a Y whose conditional law given M = m is the same as that of Y, for each m G M . 
One then feeds Y to the given detector. 



22.3.4 Simulating Observables 

For T(Y) to form a sufficient statistic, we do not require that T(-) be invertible, i.e., 
that Y be recoverable from T(Y). Indeed, the notion of sufficient statistics is most 
useful when this transformation is not invertible, in which case T(-) "summarizes" 
the information in the observation Y that is needed for guessing M . Nevertheless, 
as we shall next show, if T(Y) forms a sufficient statistic, then from T(Y) we 
can produce (using a local random number generator) a vector Y that appears 
statistically like Y in the sense that the conditional law of Y given M is identical 
to the conditional law of Y given M. 

To expand on this, we first explain what we mean by "we can produce . . . Y" and 
then elaborate on the consequences of the vector Y having the same conditional 
law given M = m as Y. By "producing" Y from T(Y) we mean that Y is the 
result of processing T(Y) with respect to M. Stated differently, for every t € K d 
there corresponds a probability distribution Pyu ( n °t dependent on m) that can 
be used to generate Y as follows: having observed T(y b s ), we use a local random 
number generator to generate the vector Y according to the distribution -Py-lt' 
where t = T(y b s ); see Figure 22.2. 

By Y appearing statistically the same as Y we mean that the conditional law of Y 
given M = m is the same as that of Y, i.e., is of density /y|m=7ti(")- Consequently, 
anything that can be learned about M from Y can also be learned about M from Y. 
Also, any guessing device that was designed to guess M based on the input Y will 
yield the same probability of error when, instead of being fed Y, it is fed Y. Thus, 
if p(error|M = m) is the conditional error probability associated with a guessing 
device that is fed Y, then it is also the conditional probability of error that will be 
incurred by this device if, rather than Y, it is fed Y; see Figure 22.2. 
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Before stating this as a theorem, let us consider the following simple example. 
Suppose that our observation consists of d random variables Y±, . . . , Yd and that, 
conditional on H = 0, these random variables are IID Bernoulli (po), i.e., they 
each take on the value 1 with probability p and the value with probability 
1 — Po- Conditional on H = 1, these d random variables are IID Bernoulli(pi). 
Here < po,Pi < 1 and po ^ p\. Consequently, the conditional probability mass 
functions are 

d 

PY 1 ,...,Y d \H= (yi,---,yd) = l[(Po( 1 -Po) 1 ' Vi ) 

= Po (1-Po) ^= l!/j , 

Py 1 ,...,Y d \H=i{Vx,...,Vd)=P?- lVi O.-Pi) d -' £ t- lV ', 



and 



so T(Yi, . . . , Yj) = 2^7=1 ^i forms a sufficient statistic by the Factorization The- 
orem. 6 From T(yi, . . . ,yd) one cannot recover the sequence yi, ■ ■ ■ ,Vd- Indeed, 
specifying that T(y\, . . . ,yd) = t does not determine which of the random vari- 
ables is one; it only determines how many of them are one. There are thus LJ 
possible outcomes (j/i, . . . yd) that are consistent with T{y\, . . . , yd) being equal to t. 
We leave it to the reader to verify that if we use a local random number genera- 
tor to pick one of these outcomes uniformly at random then the result (Y\, . . . Yd) 
will have the same conditional law given H as (Yi, . . . , Yd). We do not, of course, 
guarantee that (Yi, . . . Yd) be identical to (Yi, . . . , Yd). (The transformation T(-) 
is, after all, not reversible.) 

For additional insight let us consider our example of (20.66). For T(j/i, 2/2) = 2/1+2/2 
we can generate Y from a uniform random variable ~ U ([0, 1)) as 



Y x = v/T(Y) cos(27t9) 

% = VT(Y)sin(27r6). 

That is, after observing T(y b s ) = t, we generate (Yi, Y2) uniformly over the tuples 
that are at radius \ft from the origin. 

This last example also demonstrates the difficulty of stating the result. The random 
vector Y in this example has a density, both when conditioned on H = and when 
conditioned on H = 1. The same applies to the random variable T(Y). However, 
the distribution that is used to generate Y from T(Y) is neither discrete nor has 
a density. All its mass is concentrated on the circle of radius y/i, so it cannot have 
a density, and it is uniformly distributed over that circle, so it cannot be discrete. 

Theorem 22.3.5 (Simulating the Observables from the Sufficient Statistic). Let 

T: M. d —> M. d be Borel measurable and let /yim=i(")> • • • > /y|m=m(') ^ e M densities 
on M. d . Then the following two statements are equivalent: 

(a) T(-) forms a sufficient statistic for the given densities. 



3 For illustration purposes we are extending the discussion here to discrete distributions. 
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(b) To every t in K d there corresponds a distribution on K d such that the fol- 
lowing holds: for every in G {1, . . . , M}, if Y = y b s is generated according 
to the density /Y|M=m(") an d if the random vector Y is then generated ac- 
cording to the distribution corresponding to t, where t = T{y Q \ is ), then Y is 
of density / Y |M=m(-)- 

Proof. For a measure-theoretic statement and proof see (Lehmann and Romano, 
2005, Theorem 2.6.1). Here we only present some intuition. Ignoring some of the 
technical details, the proof is very simple. The sufficiency of T(-) is equivalent 
to M^o— T(Y)^— Y forming a Markov chain for every prior on M. This latter 
condition is equivalent by Proposition 22.3.3 (c/. (c)) to the conditional distribution 
of Y given (T(Y),MJ being the same as given T(Y) only. This latter condition 
is equivalent to the conditional distribution of Y given T(Y) not depending on 
which density in the family {f-Y\M=m{')}m£M was used to generate Y, i.e., to the 
existence of a conditional distribution of Y given T(Y) that does not depend on 
m £ M. □ 



22.4 Identifying Sufficient Statistics 

Often a sufficient statistic can be identified without having to compute and factorize 
the conditional densities of the observation. A number of such cases are described 
in this section. 



22.4.1 Invertible Transformation 

We begin by showing that, ignoring some technical details, any invertible transfor- 
mation forms a sufficient statistic. It may not be a particularly helpful sufficient 
statistic because it does not "summarize" the observation, but it is a sufficient 
statistic nonetheless. 

Proposition 22.4.1 (Reversible Transformations Yield Sufficient Statistics). If 

T: M. d — » S. d is Borel measurable with a Borel measurable inverse, then T(-) forms 
a sufficient statistic for guessing M based on Y. 

Proof. We provide two proofs. The first uses the definition. We need to verify that 
from T(y b s ) one can compute the conditional distribution of M given Y = y bs- 
This is obvious because if t = T(y b s ), then one can compute Pr[M = m | Y = y b s ] 
from t by first applying the inverse T _1 (t) to recover y b s and by then substituting 
the result in the expression for Pr[M = m\ Y = y bs] (21.12). 

A second proof can be based on Proposition 22.3.4. We need to verify that for any 
prior {ir m } 

M^-T(Y)^-Y 

forms a Markov chain. To this end we note that, by Theorem 20.11.3, it suffices 
to verify that M and Y are conditionally independent given T(Y). This is clear 
because the invertibility of T(-) guarantees that, conditional on T(Y), the random 
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vector Y is deterministic and hence independent of any random variable and a 
fortiori of M . □ 

22.4.2 A Sufficient Statistic Is Computable from the Statistic 

Intuitively, we think about T(-) as forming a sufficient statistic if T(Y) contains 
all the information about Y that is relevant to guessing M. For this intuition to 
make sense it had better be the case that if T(-) forms a sufficient statistic for 
guessing M based on Y, and if T(Y) is computable from S{Y), then S(-) also 
forms a sufficient statistic. Fortunately, this is so: 

Proposition 22.4.2. Suppose that a Borel measurable mapping T: WL d — » R d forms 
a sufficient statistic for the M densities {fY\M=m{')}meM on ^ d . Let the mapping 
S: K d — > M. d be Borel measurable. IfT(-) can be written as the composition ip o S 
of S with some Borel measurable function if): R d — ► M. d , then S(-) also forms a 
sufficient statistic for these densities. 

Proof. We need to show that Pr[M = m|Y = y bs] is computable from <S(yobs)- 
This follows because, by assumption, T(y b s ) is computable from S(y bs) and 
because the sufficiency of T(-) implies that Pr[M = m|Y = y bs] is computable 
from T(y obs ). D 

22.4.3 Establishing Sufficiency in Two Steps 

It is sometimes convenient to establish sufficiency in two steps: in the first step 
we establish that T(Y) is sufficient for guessing M based on Y, and in the second 
step we establish that S(T) is sufficient for guessing M based on T(Y). The 
next proposition demonstrates that it then follows that S(T(Y)) forms a sufficient 
statistic for guessing M based on Y. 

Proposition 22.4.3. IfT: M. d — > M. d forms a sufficient statistic for the M densities 
{fY\M=m{')}meM an d if S : K d — > K d forms a sufficient statistic for the corre- 
sponding family of densities o/T(Y), then the composition S oT forms a sufficient 
statistic for the densities {f-Y\M=m{')}meM- 

Proof. We shall establish the sufficiency of SoT by proving that for any prior {7r m } 

M-o-S(T(Y))-°-Y. 

This follows because for every m G M and every y bs £ K d 

Pr[M = m | 5(T(Y)) = S(T(y ohs ))} = Pr[M = m \ T(Y) = T(y obs )] 

= Pr[M = m|Y = y obs ], 

where the first equality follows from the sufficiency of S{T{Y)) for guessing M 
based on T(Y), and where the second equality follows from the sufficiency of T(Y) 
for guessing M based on Y. □ 
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22.4.4 Guessing whether M Lies in a Given Subset of M. 

We motivate the next result with the following example, which arises in the detec- 
tion of PAM signals in white Gaussian noise (Section 28.3 ahead). Suppose that 
the distribution of the observable Y is determined by the value of a fc-tuple of bits 
(D\, . . . , Dk). Thus, to each of the 2 fc values that the fc-tuple (D\, . . . , Dk) can take, 
there corresponds a distribution on Y of some given density /Y|£>i=di,...,D fc =d fc (")- 
Suppose now that T(-) forms a sufficient statistic for this family of M = 2 k den- 
sities. The result we next describe guarantees that T(-) is also sufficient for the 
binary hypothesis testing problem of guessing whether a specific bit Dj is zero or 
one. More precisely, we shall show that if {^(d!,...,d k )} ls an y nondegenerate prior 
on the 2 k different fc-tuples of bits, then T(-) forms a sufficient statistic for the two 
densities 

/ , n (d!,...,d k ) /Y|Ui=di,...,D h =d fc (y)> /_^ 1"(di,...,d fc ) f~Y\D 1 =d 1 ,...,D k =d h (y)- 
(di,...,d),) (di,...,d fc ) 

dj— ^j — 1 

Proposition 22.4.4 (Guessing whether M Is in K). Let T: R d -> K d ' /orm a 
sufficient statistic for the M densities {f-Y\M=m(')}meM- Let the set K, <Z M be a 
nonempty strict subset of M. Let {7r TO } &e a prior on M satisfying 

< ^ 7T m < 1. 

meK 

Then T(-) forms a sufficient statistic for the two densities 

V ^ X ""m /Y|M=m(y) and y ^ X) 7rm ^Y|M=m(y)- (22.37) 

Proof. By the Factorization Theorem it follows that the sufficiency of T(-) for the 
family {f-Y\M=m{')}m£M i s equivalent to the condition that for every m£M and 
for every y ^ 3^o 

/Y|M= m (y)=ffm(r(y))/i(y), (22.38) 

where the set ,Vo C K d is of Lebesgue measure zero; where {g m (')}m€M ar e non- 
negative Borel measurable functions from M. d ; and where h(-) is a nonnegative 
Borel measurable function from WL d . Consequently, 



X 7Tm/ Y |M=m(y) = XI ^m 5 m (T(y)) /j(y) 
EK. 

X TmSm(T(y)) ) /»(y), y £ ^o, (22.39a) 



m£K m£fc 



and 



X^ ^ /v|M=m(y) = X] ^^(^(y))My) 



X ^^(^(y)) ) My), y £ 3V (22.39b) 
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The factorization (22.39) of the densities in (22.37) proves that T(-) is also sufficient 
for these two densities. □ 

Note 22.4.5. The proposition also extends to more general partitions as follows. 
Suppose that T(-) is sufficient for the family {f^\M=m{')}meM- Let /Ci, . . . ,/C K be 
disjoint nonempty subsets of M. whose union is equal to M., and let the prior {7r m } 
be such that 

Then T(-) is sufficient for the k densities 

y^ zZ 7r " l /Y|M=m(y).--->y ^ XI 7r ™/Y|M= m (y)- 

22.4.5 Conditionally Independent Observations 

Our next result deals with a situation where we need to guess M based on two 
observations: Yi and Y2. We assume that Ti(Yi) forms a sufficient statistic for 
guessing M when only Yi is observed, and that T2(Y2) forms a sufficient statistic 
for guessing M when only Y2 is observed. It is tempting to conjecture that in 
this case the pair (Ti(Yi),T2(Y2)) must form a sufficient statistic for guessing M 
when both Yi and Y 2 are observed. But, without additional assumptions, this is 
not the case. An example where this fails can be constructed as follows. Let M 
and Z be independent with M taking on the values and 1 equiprobably and with 
Z ~ jV(0, 1). Suppose that Y\ = M+Z and that Yi = Z. In this case the invertible 
mapping Ti(Yi) = Y\ forms a sufficient statistic for guessing M based on Y\ alone, 
and the mapping T2(Y"2) = 17 forms a sufficient statistic for guessing M based 
on Yi alone (because M and Z are independent). Nevertheless, the pair (Yi, 17) is 
not sufficient for guessing M based on the pair (Yi, Y2). Basing one's guess of M on 
(Yi, 17) is not as good as basing it on the pair (Y\, Y2). (The reader is encouraged 
to verify that Y\ — Yi is sufficient for guessing M based on (Yi,Yi) and that M 
can be guessed error-free from Y\ — Yi.) 

The additional assumption we need is that Y\ and Yi be conditionally independent 
given M. (It would make no sense to assume that they are independent, because 
they are presumably both related to M.) This assumption is valid in many appli- 
cations. For example, it occurs when a signal is received at two different antennas 
with the additive noises in the two antennas being independent. 

Proposition 22.4.6 (Conditionally Independent Observations). Let the mapping 
T\ : M. dl — > K d i form a sufficient statistic for guessing M based on the observation 
Yi € M dl , and let T 2 : R d2 — ► WL d ? form a sufficient statistic for guessing M based on 
the observation Yi G K^ 2 . If Yi and Y2 are conditionally independent given M, 
then the pair (Ti(Yi), Ti(Yij) forms a sufficient statistic for guessing M based on 
the pair (Yi, Y2). 

Proof. The proof we offer is based on the Factorization Theorem. The hypothesis 
that T\ : M dl — ► K dl forms a sufficient statistic for guessing M based on the obser- 
vation Yi implies the existence of nonnegative functions {gm } GM and h' 1 ' and 
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a subset 3^j C R dl of Lebesgue measure zero such that 

/ Yl |M= m (yi)=^ ) ( T i(yi))^ (1) (yi), meM, yi^y^. (22.40) 

Similarly, the hypothesis that T^-) is sufficient for guessing M based on Y2 im- 
plies the existence of nonnegative functions {gm } GM and h' 2 ' and a subset of 
Lebesgue measure zero 3^o C H^ 2 such that 

/Y 2 |M= m (y 2 ) = 3L 2) ( T 2(y 2 ))/i (2) (y2), meM, y 2 ^ 2) . (22.41) 

The conditional independence of Yi and Y2 given M implies 7 

/Y 1 ,Y 2 |M=m(yi,y2) = /Yi|M=m(yi)/Y 2 |M=m(y2); 

meM, yi e R dl , y 2 eM. d2 . (22.42) 
Combining (22.40), (22.41), and (22.42), we obtain 

/Y 1 ,Y 2 |Af =m (yi,y2) = £ ) ( r i(yi)R 2) (r 2 (y 2 ))^ 1 )(y 1 )^ 2 )(y 2 ) , 

9m (T l{yi ),T 2 (y 2 )) h ^'^ 

meM,y 1 ^y { 1) ,y 2 ^y^. (22.43) 

The set of pairs (yi,y 2 ) € R dl x R d2 for which yi is in y$ and/or y 2 is in y$ 
is of Lebesgue measure zero, and consequently, the factorization (22.43) implies 
that the pair (T 1 (Y 1 ),T 2 (Y 2 )) forms a sufficient statistic for guessing M based on 

(Y!,Y 2 ). ■■■■-- Q 

22.5 Irrelevant Data 

Closely related to the notion of sufficient statistics is the notion of irrelevant data. 
This notion is particularly useful when we think about the data as consisting of 
two parts. Heuristically speaking, we say that the second part of the data is 
irrelevant for guessing M given the first, if it adds no information about M that 
is not already contained in the first part. In such cases the second part of the 
data can be ignored. It should be emphasized that the question whether a part 
of the observation is irrelevant depends not only on its dependence on the random 
variable to be guessed but also on the other part of the observation. 

Definition 22.5.1 (Irrelevant Data). We say that R is irrelevant for guessing M 
given Y, z/Y forms a sufficient statistic for guessing M based on (Y,R). 

Equivalently, R is irrelevant for guessing M given Y, if for any prior {7r m } on M 

M-o-Y-o-(Y,i2), (22.44) 

i.e., 

M^-Y^-R. (22.45) 



7 Technically speaking, this must only hold outside a set of Lebesgue measure zero, but we do 
not want to make things even more cumbersome. 
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Example 22.5.2. Let H take on the values and 1, and assume that, conditional 
on H = 0, the observation Y is A/"(0,<Tq) and that, conditional on H = 1, it is 
AMO, <j\ ) . Rather than thinking of this problem as a decision problem with a single 
observation, let us think of it as a decision problem with two observations (Yi, Y 2 ), 
where Y"i is the absolute value of Y, and where Y<i is the sign of Y . Thus Y = Y\ Y2, 
where Y\ > and Y 2 £ {+1j — !}• (The probability that Y = is zero under each 
hypothesis, so we need not define the sign of zero.) We now show that Y 2 (= the 
sign of Y) is irrelevant data for guessing H given Y\ (= the magnitude of Y). Or, 
in other words, the magnitude of Y is a sufficient statistic for guessing H based on 
(Yi^Y?). Indeed the likelihood-ratio function 



LR(yi,2/ 2 ) 



fY u Y 2 \H=o(yi,y2) 

fy u v 2 1 h =1(2/1,2/2) 



\P^l 



1 exr) ( (.V1V?)* \ 

o-i / vl v\ 



— ■ exp , „ „ 

(To \2af 2<Tq 

can be computed from the magnitude j/i only, so Yi is a sufficient statistic for 
guessing H based on (1^, y 2 ). 

The following two notes clarify that the notion of irrelevance is different from that 
of statistical independence. Neither implies the other. 

Note 22.5.3. A RV can be independent of the RV that we wish to guess and yet 
not be irrelevant. 



Proof. We provide an example of a RV R that is independent of the RV H that 
we wish to guess and that is nonetheless not irrelevant. Suppose that H takes on 
the values and 1, and assume that under both hypotheses Y ~ Bernoulli(l/2): 

Pr[V = 1 I H = 0] = Pr[y = 1 I H = l] = -. 

Further assume that under H = the RV R is given by © Y = Y , whereas under 
H = 1 it is given by 1 © Y. Here © denotes the exclusive-or operation or mod-2 
addition. 

The distribution of R does not depend on the hypothesis; it is Bernoulli(l/2) both 
conditional on H = and conditional on H = 1. But R is not irrelevant for 
guessing H given Y. In fact, if we had to guess H based on Y only, our probability 
of error would be 1/2. But if we base our decision on Y and R, then our probability 
of error is zero because 

H=Y(BR. □ 

Note 22.5.4. A RV can be irrelevant even if it is statistically dependent on the 
RV that we wish to guess. 
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Proof. As an example, consider the case where R is equal to Y with probability one 
and that Y (and hence also R) is statistically dependent on the RV M that we wish 
to guess. Since R is deterministically equal to Y, it follows that, conditional on Y, 
the random variable R is deterministic. Consequently, since a deterministic RV is 
independent of every RV, it follows that M and R are conditionally independent 
given Y, i.e., that (22.45) holds. Thus, even though in this example R is statistically 
dependent on M, it is irrelevant for guessing M given Y . The intuitive explanation 
is that, in this example, R is irrelevant for guessing M given Y not because it 
conveys no information about M (it does!) but because it conveys no information 
about M that is not already conveyed by Y. □ 

Condition (22.44) is often difficult to establish directly, especially when the distri- 
bution of the pair (R, Y) is specified in terms of its conditional density given M, 
because in this case the conditional law of (M, R) given Y can be unwieldy. In 
some cases the following proposition can be used to establish that R is irrelevant. 

Proposition 22.5.5 (A Condition that Implies Irrelevance). Suppose that the con- 
ditional law of R given M = m does not depend on m and that, for each m € M, 
we have that, conditionally on M = m, the observations Y and R are independent. 
Then R is irrelevant for guessing M given Y. 

Proof. We provide the proof for the case where the pair (Y, R) has a conditional 
density given M. The discrete case or the mixed case (where one has a conditional 
density and the other a conditional PMF) can be treated with the same approach. 
To prove this proposition we shall demonstrate that Y is a sufficient statistic for 
guessing H based on (Y, R) using the Factorization Theorem. To that end, we 
express the conditional density of (Y, R) as 

fv,R\M=m(y, r) = / Y |M=m(y) f R \M=m{ r ) 
= /Y|M=m(y) f R {r) 

= 9m(y)h(y,r), (22.46) 

where the first equality follows from the conditional independence of Y and R 
given M; the second from the hypothesis that the conditional density of R given 
M = m does not depend on m and by denoting this density by /«(•); and the 
final equality follows by defining g m (y) = fy\M=m{y) and h(y,r) = fii{r). The 
factorization (22.46) demonstrates that Y forms a sufficient statistic for guessing M 
based on (Y, R), i.e., that R is irrelevant for guessing M given Y. □ 



22.6 Testing with Random Parameters 

The notions of sufficient statistics and irrelevance also apply when testing in the 
presence of a random parameter. If the random parameter is not observed, 
then T(Y) is sufficient if, and only if, for any prior {7r m } on M 

M-o-T(Y)-o-Y. (22.47) 
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If is of density /e(') and independent of M, then, as in (20.101), we can express 
the conditional density of Y given M = m as 



/y| M=m (y) = / /Y|e=0,M=m(y)/e(0) d #; 

so T(-) forms a sufficient statistic if, and only if, it forms a sufficient statistic for 
the M densities 



/Y|e=e,M=m(y) /e(#) d # 

mGAt 

Similarly, i? is irrelevant for guessing M given Y if, and only if, 

M^-Y^-R 

forms a Markov chain for every prior {ir m } on M.. 

If the parameter is observed, then T( Y, 0) is a sufficient statistic if, and only if, 
for any prior {7r TO } on M. 

M-o-T(Y,e)-o-(Y,e). 

If is independent of M and of density /e(')i then the density /Y,0|M=m(') can 
be expressed, as in (20.104), as 

/Y,e|M=m(y;#) = /eW /Y|e=0,M=m(y); 

so T(-) forms a sufficient statistic if, and only if, it forms a sufficient statistic for 
the M densities 

{(0,y)~/e(0)/Y|0=*,M= m (y)} • 

Similarly, R is irrelevant for guessing M given (Y, 0) if, and only if, 

M-o-(Y, &)-o-R. 

The following lemma provides an easily-verifiable condition that guarantees that R 
is irrelevant for guessing H based on Y, irrespective of whether the random pa- 
rameter is observed or not. 

Lemma 22.6.1. If for any prior {ir m } on M we have that R is independent of the 
triplet (M, 0,Y), 8 then R is irrelevant for guessing M given (0,Y) and also for 
guessing M given Y. 

Proof. To prove the lemma when is observed, we need to show that the inde- 
pendence of R and the triplet (M, 0, Y) implies 

M-o-(Y, e)-o-R, 



8 Note that being independent of the triplet is a stronger condition than being independent of 
each of the members of the triplet! 
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i.e., that the conditional distribution of R given (Y,0) is the same as given 
(M, Y,0). This is indeed the case because R is independent of (M, Y,0) so 
the two conditional distributions are equal to the unconditional distribution of R. 

To prove the lemma in the case where is unobserved, we need to show that the 
independence of R and the triplet (M, 0, Y) implies that 

M-°-Y-o-R. 

Again, one can do so by noting that the conditional distribution of R given Y is 
equal to the conditional distribution of R given (Y, M) because both are equal to 
the unconditional distribution of R. □ 



22.7 Additional Reading 

The classical definition of sufficient statistic as a mapping T(-) such that the dis- 
tribution of Y given (T(Y), M = m) does not depend on m is due to R. A. Fisher. 
A. N. Kolmogorov defined T(-) to be sufficient if for every prior {ir m } the a pos- 
teriori distribution of M given Y can be computed from T(Y). In our setting 
where M takes on a finite number of values the two definitions are equivalent. For 
an example where the definitions differ, see (Blackwell and Ramamoorthi, 1982). 

For a discussion of pairwise sufficiency and its relation to sufficiency, see (Halmos 
and Savage, 1949). 

22.8 Exercises 

Exercise 22.1 (Another Proof of Proposition 22.4.6). Give an alternative proof of Propo- 
sition 22.4.6 using Theorem 22.3.5. 

Exercise 22.2 (Hypothesis Testing with Two Observations). Let H take on the values 
and 1 equiprobably. Let Yi be a random vector taking value in R , and let Y 2 be a 
random variable. Conditional on H — 0, 

Yi = /u + Zi, Y 2 = u + Z 2 , 

and, conditional on H — 1, 

Yi = -/a + Zi, Y 2 = -a + Z 2 . 

Here H, Zi, and Z 2 are independent with the components of Zi being IID A/"(0, 1), 
with Z 2 being a mean-one exponential, and with (i£l and a£l being deterministic. 

(i) Find an optimal rule for guessing H based on Yi . Find a one-dimensional sufficient 
statistic. 

(ii) Find an optimal rule for guessing H based on Y 2 . 

(iii) Find a two-dimensional sufficient statistic (Ti , T 2 ) for guessing H based on (Yi ,Y 2 ). 

(iv) Find an optimal rule for guessing H based on the pair (Ti, T 2 ). 
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Exercise 22.3 (Sufficient Statistics and the Bhattacharyya Bound). Show that if the 
mapping T : M d — > M d is a sufficient statistic for the densities /y|h=o(') ^ /y|h=i( - )' 
and if T — T(Y) is of conditional densities /t|h=o(0 an( i /t|h=i(')i then 

^/ J.h\H= {y).U\H=i(y) d y = 7, \fT\H=o(t)f TlH=1 (t)dt. 



Hint: You may want to first derive the identity 

' 7y|h=o(Y) 



/Y|fl=o(y)/ Y |tf=i(y) d y = E 



/v|«=i(Y) 



1/2 



ff = 1 



Exercise 22.4 (Sufficient Statistics and Irrelevant Data). 

(i) Show that if the hypotheses of Proposition 22.5.5 are satisfied, then the random 
variables Y and R must be independent also when one does not condition on M. 

(ii) Show that the conditions for irrelevance in that proposition are not necessary. 

Exercise 22.5 (Two More Characterizations of Sufficient Statistics). Let P y \h=q(-) and 

■FV|H=i(0 be probability mass functions on the finite set 3^- We say that T(Y) forms a 
sufficient statistic for guessing H based on Y if H—o—T(Y)—o—Y for every prior on H. 
Show that each of the following conditions is equivalent to T(Y) forming a sufficient 
statistic for guessing H based on Y: 

(a) For every y e y satisfying P Y \h=o{v) + Py\H=i(y) > we have 

Pv\H= (y) = Pt\h=o{ t (v)) 
Py\H=i(y) PT\H=i(T(y))' 

where we adopt the convention (20.39). 

(b) For every prior (7To, 7Ti) on H there exists a decision rule that bases its decision on 
ttq, tt\, and T(Y) and that is optimal for guessing H based on Y. 

Exercise 22.6 (Pairwise Sufficiency Implies Sufficiency). Prove Proposition 22.3.2 in the 
case where the conditional densities of the observable given each of the hypotheses are 
positive. 

Exercise 22.7 (Simulating the Observable), fn all the examples we gave in Section 22.3.4 
the random vector Y was generated from T(y bs) uniformly over the set of vectors £ 
in R d satisfying T(£) = T(y bs). Provide an example where this is not the case. 

Hint: The setup of Proposition 22.5.5 might be useful. 

Exercise 22.8 (Densities with Zeros). Conditional on H — 0, the d components of Y are 
11D and uniformly distributed over the interval [ao,/3o]- Conditional on H — 1, they are 
11D and uniformly distributed over the interval [ai,/3i]. Show that the tuple 

(max{y (1) , . . . , Y (d) }, min{r (1) , . . . ,Y (d> }) 

forms a sufficient statistic for guessing H based on Y. 
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Exercise 22.9 (Optimality Does Not Imply Sufficiency). Let H take value in the set 
{0, 1}, and let d — 2. Suppose that 

Y j = (l-2H) + eZ j , j = l,...,d, 

where H,Q, Z\, . . . , Z& are independent with O taking on the distinct positive values o"o 
and <Ti with probability po and pi respectively, and with Zi, . . . ,Zd being IID 7V(0, 1). 
LetT = £.Y,, 

(i) Show that T forms a sufficient statistic for guessing H based on Y\ , . . . , Yd when Q 
is observed. 

(ii) Show that T does not form a sufficient statistic for guessing H based on Y\ , . . . , Yd 
when is not observed. 

(iii) Show that notwithstanding Part (ii), if H has a uniform prior, then the decision 
rule that guesses U H — 0" whenever T > is optimal both when O is observed and 
when it is not observed. 

Exercise 22.10 (Markovity Implies Markovity). Suppose that for every prior on M 

(M, A)-o-T(Y)-o-Y 

forms a Markov chain, where M takes value in the set M. = {f, . . . ,M}, where A and Y 
are random vectors, and where T(-) is Borel measurable. Does this imply that T(-) forms 
a sufficient statistic for guessing M based on Y? 



Chapter 23 

The Multivariate Gaussian Distribution 

23.1 Introduction 

The multivariate Gaussian distribution is arguably the most important multi- 
variate distribution in Digital Communications. It is the extension of the univariate 
Gaussian distribution from scalars to vectors. A random vector of this distribu- 
tion is said to be a Gaussian vector, and its components are said to be jointly 
Gaussian. In this chapter we shall define this distribution, provide some useful 
characterizations, and study some of its key properties. To emphasize its con- 
nection to the univariate distribution, we shall derive it along the same lines we 
followed in deriving the univariate Gaussian distribution in Chapter 19. 

There are a number of equivalent ways to define the multivariate Gaussian distri- 
bution, and authors typically pick one definition and then proceed over the course 
of numerous pages to derive alternate characterizations. We shall also proceed in 
this way, but to satisfy the impatient reader's curiosity we shall state the various 
equivalent definitions in this section. The proof of their equivalence will be spread 
over the whole chapter. 

In the following definition we use the notation introduced in Section 17.2. In 
particular, all vectors are column vectors, and we denote the components of the 
vector a eR" by a^\ . . . , a^ . 

Definition 23.1.1 (Standard Gaussians, Centered Gaussians, and Gaussians). 

(i) A random vector W taking value in R™ is said to be a standard Gaussian 
if its n components W 1 > 1 \ . . . , W^ n ' are independent and each is a zero-mean 
unit-variance univariate Gaussian. 

(ii) A random vector X taking value in W 1 is said to be a centered Gaussian 
if there exists some deterministic n x m matrix A such that the distribution 
o/X is the same as the distribution of AW , i.e., 

X = AW, (23.1) 

where W is a standard Gaussian with m components. 

454 
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(Hi) A random vector X taking value in W 1 is said to be Gaussian if there exists 
some deterministic n x m matrix A and some deterministic vector fj, € 1" 
such that the distribution o/X is equal to the distribution o/AW + /x, i.e., if 

X = AW + /x, (23.2) 

where W is a standard Gaussian with m components. 

The random vectors AW + fi and X can have identical laws only if they have 
identical mean vectors. As we shall see, the linearity of expectation and the fact 
that a standard Gaussian is of zero mean imply that the mean vector of AW + /x 
is equal to \x. Thus, AW + /x and X can have identical laws only if fj, = E[X]. 
Consequently, X is a Gaussian random vector if, and only if, for some A and W 
as above X = AW + E[X]. Stated differently, X is a Gaussian random vector if, 
and only if, X — E[X] is a centered Gaussian. 

While Definition 23.1.1 allows for the matrix A to be rectangular, we shall see in 
Corollary 23.6.13 that every centered Gaussian can be generated from a standard 
Gaussian by multiplication by a square matrix. That is, if X is an n-dimensional 
centered Gaussian, then there exists annxn square matrix A such that X = AW, 
where W is a standard Gaussian. 

In fact, we shall see in Theorem 23.6.14 that we can even limit ourselves to square 
matrices that are the product of an orthogonal matrix by a diagonal matrix. Since 
multiplying W by a diagonal matrix merely scales its components while leaving 
them independent and Gaussian, it follows that X is a centered Gaussian if, and 
only if, its law is the same as the law of the result of applying an orthogonal 
transformation to a random vector whose components are independent zero-mean 
univariate Gaussians (not necessarily of equal variance). 

In view of Definition 23.1.1, it is not surprising that applying a linear transfor- 
mation to a Gaussian vector results in a Gaussian vector ((23.43) ahead). The 
reverse is perhaps more surprising: X is a Gaussian vector if, and only if, the re- 
sult of applying any deterministic linear functional to X has a univariate Gaussian 
distribution (Theorem 23.6.17 ahead). 

We conclude this section with the following pact with the reader. 

(i) Unless preceded by the word "random" or "Gaussian," all scalars, vectors, 
and matrices in this chapter are deterministic. 

(ii) Unless preceded by the word "complex," all scalars, vectors, and matrices in 
this chapter are real. 

But, without violating this pact, we shall sometimes get excited and throw in the 
words "real" and "deterministic" even when unnecessary. 

23.2 Notation and Preliminaries 

Our notation in this chapter expands upon the one introduced in Section 17.2. To 
minimize page flipping, we repeat here parts of that section. 
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Deterministic vectors are denoted by boldface lowercase letters such as w, whereas 
random vectors are denoted by boldface uppercase letters such as W. When we 
deal with deterministic matrices we make an exception to our rule of trying to 
denote deterministic quantities by lowercase letters. 1 Thus, deterministic matrices 
are denoted by uppercase letters. But to make it clear that we are dealing with 
a deterministic matrix and not a scalar random variable, we use special fonts to 
distinguish the two. Thus A denotes a deterministic matrix, whereas A denotes a 
random variable. Random matrices, which only appear briefly in this book, are 
denoted by uppercase letters of yet another font, e.g., H. 

An n x m deterministic real matrix A is an array of real numbers having n rows 
and m columns 



a (2,D 



,.(2,2) 



\a 



(n,l) „(n,2) 



,(2,m) 



a (n,m)J 



The Row-j Column-! element of the matrix A is denoted 

a™ or [A] jtl . 

The transpose of an n x m matrix A is the mxn matrix A T whose Row-j Column-! 
entry is equal to the Row-! Column- j entry of A: 



[a t ; 



j,e 



[A] 



t-r- 



j e {!,..., m}, £e{l, 



»}■ 



We shall repeatedly use the fact that if the matrix-product AB is defined (i.e., if 
the number of columns of A is the same as the number of rows of B), then the 
transpose of the product is the product of the transposes in reverse order 



(AB) T 



B T A T . 



(23.3) 



The n x n identity matrix whose diagonal elements are all 1 and whose off- 
diagonal elements are all is denoted \ n . The all-zero matrix whose components 
are all zero is denoted 0. 

An n x 1 matrix is an n-vector, or a vector for short. Thus, unless otherwise 
specified, all the vectors we shall encounter are column vectors. 2 The components 
of an n-vector a are denoted by a*- 1 ^, . . . , a'™' so 



,.(«) 



or, in a typographically more efficient form, 



(a 



(i) 



,(™h T 



x We have already made some exceptions to this rule when we dealt with deterministic con- 
stants that are by convention always denoted using uppercase letters, e.g., bandwidth W, ampli- 
tude A, baud period T s , etc. 

2 An exception to this rule is in our treatment of linear codes where the tradition of using row 
vectors is too strong to change. 
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The vector whose components are all zero is denoted by 0. The square root of the 
sum of the squares of the components of a real n-vector a is denoted by ||a||: 



.£>«), aeM". (23.4) 



If a= (a (1 ',..-,« (n) ) T andb= (&W, . . . , b^) 7 , then 3 



a T b = 5>W&W 



In particular, 



b T a. 



H 2 = $> 



W\ 



= a T a. (23.5) 

Note the difference between a T a and aa T : the former is the scalar ||a|| 2 whereas 
the latter is the n x n matrix whose Row- J Column-^ 1 element is a^'a^'. 

The determinant of a square matrix A is denoted by det A. We note that a matrix 
and its transpose have equal determinants 

det(A T ) =detA, (23.6) 

and that the determinant of the product of two square matrices is the product of 
the determinants 

det(AB) = det(A) det(B). (23.7) 

We say that a square n x n matrix A is singular if its determinant is zero or, 
equivalently, if its columns are linearly dependent or, equivalently, if its rows are 
linearly dependent or, equivalently, if there exists some nonzero vector a. € R" 
such that Aq = 0. 



23.3 Some Results on Matrices 

We next survey some of the results from Matrix Theory that we shall be using. Par- 
ticularly important to us are results on positive semidefinite matrices, because, as 
we shall see in Proposition 23.6.1, every covariance matrix is positive semidefinite, 
and every positive semidefinite matrix is the covariance matrix of some random 
vector. 



3 In (20.84) we denoted a T b by (a, b) £ 
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23.3.1 Orthogonal Matrices 

Definition 23.3.1 (Orthogonal Matrices). An n x n real matrix U is said to be 
orthogonal if 

UU T = I„. (23.8) 

As proved in (Axler, 1997, Chapter 7, Theorem 7.36), the condition (23.8) is equiv- 
alent to the condition 

U T U = l„. (23.9) 

Thus, a real matrix is orthogonal if, and only if, its transpose is orthogonal. From 
(23.8) and (23.9) we also obtain: 

Note 23.3.2. The inverse of an orthogonal matrix is its transpose. 
If we write an n x n matrix U in terms of its columns as 



( T ' 


• T \ 


Vl • 


• Ipn 


V I • 


• 1 / 



then (23.9) can be expressed as 
l„ = U T U 



/- *I -\ 



V- *l -/ 



/ T 



^i 



V i 



T \ 

i ; 






thus showing that a real n x n matrix U is orthogonal if, and only if, its n columns 
V>i,...,V« satisfy 



V>*«/V = !{" = v '}, v,v' &{!,..., n}. 



(23.10) 



Using the same argument but starting with (23.8) we can prove a similar result 
about the rows of an orthogonal matrix: if the rows of a real n x n matrix U are 
denoted by 4>J, . . . , <p^, i.e., 



h 
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then U is orthogonal if, and only if, 

<f>l<t>w = I{v = i/}, i/,i/e{l,...,n}. (23.11) 

Recalling that the determinant of a product of square matrices is the product of 
the determinants and that the determinant of a matrix is equal to the determinant 
of its transpose, we obtain that for every square matrix U 

det(UU T ) = (detU) 2 . (23.12) 

Consequently, by taking the determinant of both sides of (23.8) we obtain that the 
determinant of an orthogonal matrix must be either +1 or — 1. It should, however, 
be noted that there are numerous examples of matrices of unit determinant that 
are not orthogonal. 

We leave it to the reader to verify that a 2 x 2 matrix is orthogonal if, and only if, 
it is equal to one of the following matrices for some choice of — 7r < 9 < ir 

cos 9 — sm9\ (cos 9 sin (9 \ , , 

sin0 cos0 J ' Vsin0 -cos9j ' (23.13) 

The former matrix corresponds to a rotation by 9 and has determinant +1, and 
the latter to a reflection followed by a rotation 

cos 9 sin 9 \ /cos 6 — sin 6\ (\ 
sin (9 —cos9j ^sin^ cos# J \0 —1 

and has determinant — 1. 

23.3.2 Symmetric Matrices 

A matrix A is said to be symmetric if it is equal to its transpose: 

A T = A. 

Only square matrices can be symmetric. A vector if) e W 1 is said to be an eigen- 
vector of the matrix A corresponding to the real eigenvalue A € M. if if} is nonzero 
and if Ai/> = \if) . The following is a key result about the eigenvectors of symmetric 
real matrices. 

Proposition 23.3.3 (Eigenvectors and Eigenvalues of Symmetric Real Matrices). 

If A is a symmetric real n x n matrix, then A has n (not necessarily distinct) 
real eigenvalues Ai, . . . , X n € R with corresponding eigenvectors if}\, . . . , ip n € R n 
satisfying 

if,lif} v ,=l{v = v'}, v,i/e{l,...,n}. (23.14) 

Proof. See, for example, (Axler, 1997, Chapter 7, Theorem 7.13, p. 136), or (Her- 
stein, 2001, Section 6.10, pp. 346-348), or (Horn and Johnson, 1985, Chapter 4, 
Section 1, Theorem 4.5.1). □ 
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The vectors ij)\, . . . , ip n are eigenvectors of the matrix A corresponding to the eigen- 
values Ai, . . . , A„ if 

kil> v = \ v il> v , ve {!,..., n}. 



(23.15) 



We next express this in an alternative way. We begin by noting that 

/ T ••• T \ / T ••• T \ 

l/>l • • • Ipn = AVl • • • A*/*; 

VI ••• I / \l ••• I J 



and that 



/ T 

v i 



T \ Ai o 



I / 



A, 



Vo 





o^ 




( T • 


• T \ 
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AiVi • 


• X n 1p n 
















A„y 




V 1 • 


■ 1 / 



Consequently Condition (23.15) can be written as 

AU = UA, 



(23.16) 



where 



( T ' 


• T \ 


■01 • 


Ipn 


V I • 


■ 1 / 



and A 



/Ai 

A 2 



Vo 



°\ 



'•• 

o xj 



(23.17) 



Condition (23.14) is equivalent to the condition that the above matrix U is orthog- 
onal. By multiplying (23.16) from the right by the inverse of U (which, because U 
is orthogonal and by (23.8), is U T ) we obtain the equivalent form A = UAU T . 
Consequently, an equivalent statement of Proposition 23.3.3 is: 

Proposition 23.3.4 (Spectral Theorem for Real Symmetric Matrices). A sym- 
metric real n x n matrix A can be written in the form 

A= UAU T 

where, as in (23.17), A is a diagonal real n x n matrix whose diagonal elements 
are the eigenvalues of A, and where U is a real nx n orthogonal matrix whose v-th 
column is an eigenvector of A corresponding to the eigenvalue in the v-th position 
on the diagonal of A. 

The reverse is also true: if A = UAU T for a real diagonal matrix A and for a real 
orthogonal matrix U, then A is symmetric, its eigenvalues are the diagonal elements 
of A, and the z^-th column of U is an eigenvector of the matrix A corresponding to 
the eigenvalue in the ^-th position on the diagonal of A. 



23.3 Some Results on Matrices 461 

23.3.3 Positive Semidefinite Matrices 

Definition 23.3.5 (Positive Semidefinite and Positive Definite Matrices). 

(i) We say that the n x n real matrix K is positive semidefinite or nonneg- 
ative definite and write 

k y o 

if K is symmetric and 

a T Ka > 0, a € K n . 

(zi) We say iftai i/ie n x n real matrix K is positive definite and write 

Ky 

«/ K is symmetric and 

a T Ka > 0, f a 7^ 0, a G K 

The following two propositions characterize positive semidefinite and positive def- 
inite matrices. For proofs, see (Axler, 1997, Chapter 7, Theorem 7.27). 

Proposition 23.3.6 (Characterizing Positive Semidefinite Matrices). Let K be a 

real n x n matrix. Then the statement that K is positive semidefinite is equivalent 
to each of the following statements: 

(a) The matrix K can be written in the form 

K = S T S (23.18) 

for some real n x n matrix S. 4 

(b) The matrix K is symmetric and all its eigenvalues are nonnegative. 

(c) The matrix K can be written in the form 

K = UAU T , (23.19) 

where A is a real n x n diagonal matrix with nonnegative entries on the 
diagonal and where U is a real n x n orthogonal matrix. 

Proposition 23.3.7 (Characterizing Positive Definite Matrices). Let K be a real 
n x n matrix. Then the statement that K is positive definite is equivalent to each 
of the following statements. 

(a) The matrix K can be written in the form K = S T S for some real n x n 
nonsingular matrix S. 

(b) The matrix K is symmetric and all its eigenvalues are positive. 



'Even if S is not a square matrix, S T S >z 0. 
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(c) The matrix K can be written in the form 

K = UAU T , 

where A is a real n x n diagonal matrix with positive entries on the diagonal 
and where U is a real n x n orthogonal matrix. 

Given a positive semidefinite matrix K, how can we find a matrix S satisfying 
K = S T S? In general, there can be many such matrices. For example, if K is the 
identity matrix, then S can be any orthogonal matrix. We mention here two useful 
choices. Being symmetric, the matrix K can be written in the form 



K= UAlT, 



(23.20) 



where U and A are as in (23.17). Since K is positive semidefinite, the diagonal 
elements of A (which are the eigenvalues of K) are nonnegative. Consequently, we 
can define the matrix 



A l/2 



A/A 



\o 







o\ 







One choice of the matrix S is 

S = A 1/2 U T . (23.21) 

Indeed, with this definition of S we have 

S T S=(A 1 / 2 U T ) T A 1 / 2 U T 
= UA 1 / 2 A 1 / 2 U T 
= UAU T 
= K, 

where the first equality follows from the definition of S; the second from the rule 
(AB) T = B T A T and from the symmetry of the diagonal matrix A 1 ' 2 ; the third from 
the definition of A 1 ' 2 ; and where the final equality follows from (23.20). 

A different choice for S, which will be less useful to us in this chapter, is 5 

UA 1/2 U T . 



The following lemmas will be used in Section 23.4.3 when we study random vectors 
of singular covariance matrices. 

Lemma 23.3.8. Let K be a real n x n positive semidefinite matrix, and let at. be a 
vector in K™. Then a T Ka = if, and only if, Ka = 0. 



5 This is the only choice for S that is positive semidefinite (Axler, 1997, Chapter 7, Proposi- 
tion 7.28), (Horn and Johnson, 1985, Chapter 7, Section 7.2, Theorem 7.2.6). 
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Proof. One direction is trivial and does not require that K be positive semidefmite: 
if Ka = 0, then a T Ka must also be equal to zero. Indeed, in this case we have by 
the associativity of matrix multiplication a T Ka = a T (Ka) = a T = 0. 

To prove the other direction, we first note that, since K is positive semidefmite, 
there exists some n x n matrix S such that K = S T S. Hence, 

a Ka = a S Sa 

= (Sa) T (Sa) 

= ||Sa|| 2 , aeR n , 

where the second equality follows from the rule for transposing a product (23.3). 
and where the third equality follows from (23.5). Consequently, if ct T Ko: = 0, then 
|| So: || = 0, so 5a = 0, and hence S T Sa = 0, i.e., Ka = 0. □ 

Lemma 23.3.9. If K is a real n x n positive definite matrix, then a T Ka = if, 
and only if, a = 0. 

Proof. Follows directly from Definition 23.3.5 of positive semidefinite matrices. □ 

23.4 Random Vectors 

23.4.1 Definitions 

Recall that an n-dimensional random vector or a random n-vector X de- 
fined over the probability space (£l,J r ,P) is a (measurable) mapping from the set 
of experiment outcomes f2 to the n-dimensional Euclidean space R". A random 
vector X is very much like a random variable, except that rather than taking value 
in the real line K, it takes value in K". In fact, an n-dimensional random vector 
can be viewed as an array of n random variables. 6 

The density of a random vector is the joint density of its components. The density 
of a random n-vector is thus a nonnegative (Borel measurable) function from K n 
to the nonnegative reals that integrates to one. 

Similarly, an n x m random matrix i is an n x m array of random variables defined 
over a common probability space. 

23.4.2 Expectations and Covariance Matrices 

The expectation E[X] of a random n-vector X = (X^ 1 ', . . . ,X^') J is a vector 
whose components are the expectations of the corresponding components of X: 7 



E[X]4 (E[xW],...,E[lW]) . (23.22 



6 In dealing with random vectors one often abandons the "coordinate free" approach and views 
vectors in a particular coordinate system. This allows one to speak of the covariance matrix in 
more familiar terms. 

7 The expectation of a random vector is only defined if the expectation of each of its compo- 
nents is defined. 
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The j-th element of E[X] is thus the expectation of the J-th component of X, 
namely, E [X")] . Similarly, the expectation of a random matrix is the matrix of 
expectations. 

If all the components of a random n-vector X are of finite variance, then we say 
that X is of finite variance. We then define its n x n covariance matrix Kxx 

as 

Kxx = e[(X-E[X])(X-E[X]) T ]. (23.23) 

That is, 



^xx 



/iW-EflW]^ 



lW-E[lW] 



\XW - E[X(")]/ 

/ Var[X«] Cov[X«,X( 2 )] 

CovfxW.xC 1 )] Var[x( 2 )] 



Cov 
Cov 



X( n ) - E[xW] 
\ 



x(i),x<") 
xw,xw 



(23.24) 



Var[X<")] / 



If n = 1 and the n-dimensional random vector X hence a scalar, then the covariance 
matrix Kxx is a 1 x 1 matrix whose sole component is the variance of the sole 
component of X. 

Note that from the n x n covariance matrix Kxx of a random n-vector X it is easy 
to compute the covariance matrix of a subset of X's components. For example, if 
we are only interested in the 2x2 covariance matrix of (X*- 1 ), X^ 2 ') 1 , then we just 
pick the first two columns and the first two rows of Kxx- More generally, the r x r 
covariance matrix of (X^ 1 ', I" 2 ', . . . ,X^ r >) T for 1 < j\ < J2 < ■ ■ ■ < j r < n is 
obtained from Kxx by picking Rows and Columns j\, . . . , j r . For example, if 



K 



xx 



^30 31 9 7\ 

31 39 11 13 

9 11 9 12 

\7 13 12 26/ 

then the covariance matrix of {X^ 2 \X^- A '^ is ( f§ 26 ) • 

We next explore the behavior of the mean vector and the covariance matrix of 
a random vector when it is multiplied by a deterministic matrix. Regarding the 
mean, we shall show that since matrix multiplication is a linear transformation, it 
commutes with the expectation operation. Consequently, if EI is a random n x m 
matrix and A is a deterministic v x n matrix, then 



E[AH] = AE[H] , 
and similarly if B is a deterministic m x v matrix, then 

E[HB] = E[e]B. 



(23.25a) 



(23.25b) 



23.4 Random Vectors 



465 



To prove (23.25a) we write out the Row-j Column-^ element of the v x m ma- 
trix E [AH] and use the linearity of expectation to relate it to the Row-j Column- £ 
element of the matrix AE[H]: 



[E[AH]]. / = E £[A] j;K [H] K , 

Ef[A],, K [e] K , 

£[A] jiK E[[H] M 






[AE[H] 



j,e 



je{l,...,v}, £e {i,...,m}. 



The proof of (23.25b) is almost identical and is omitted. 

The transpose operation also commutes with expectation: if EI is a random matrix 



then 



(E[ 



(23.26) 



As to the covariance matrix, we next show that if A is a deterministic matrix and 
if X is a random vector, then the covariance matrix Kyy of the random vector 
Y = AX can be expressed in terms of the covariance matrix Kxx of X as 



K 



YY 



AK 



XX' 



AX. 



(23.27) 



Indeed, 



K YY 4 E [(Y-E[Y])(Y-E[Y]) T ] 

= E[(AX - E[AX])(AX - E[AX]) T ] 
= E[A(X-E[X])(A(X-E[X])) T ] 
= E[A(X-E[X])(X-E[X]) T A T ] 
= AE[(X-E[X])(X-E[X]) T A T ] 
= AE[(X-E[X])(X-E[X]) T ]A T 
= AKvxA T . 



A key property of covariance matrices is that, as we shall next show, they are all 
positive semidefinite. That is, the covariance matrix Kxx of any random vector X 
is a symmetric matrix satisfying 



a Kxx ct > 0, a e 



(23.28) 



(In Proposition 23.6.1 we shall see that this property fully characterizes covari- 
ance matrices: every positive semidefinite matrix is the covariance matrix of some 
random vector.) 

To prove (23.28) it suffices to consider the case where X is of zero mean because 
the covariance matrix of X is the same as the covariance matrix of X — E [X] . The 
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symmetry of Kxx follows from the definition of the covariance matrix (23.23); from 
the fact that expectation and transposition commute (23.26); and from the formula 
for the transpose of a product of matrices (23.3): 

Kxx = ( EfXX T 



e[(xx t ) t 



= E[XX T ] 

= Kxx • (23.29) 

The nonnegativity of a T Kxx ot for any deterministic a G R n follows by noting 
that by (23.27) (applied with A = a T ) the term a T Kxx ol is the variance of the 
scalar random variable a T X, i.e., 

a T Kxx a = Var [a T X] (23.30) 

and, as such, is nonnegative. 

23.4.3 Singular Covariance Matrices 

A random vector having a singular covariance matrix can be unwieldy because it 
cannot have a density function. Indeed, as we shall see in Corollary 23.4.2, any such 
random vector has at least one component that is determined (with probability one) 
by the other components. In this section we shall propose a way of manipulating 
such vectors. Roughly speaking, the idea is that if X has a singular covariance 
matrix, then we choose a subset of its components so that the covariance matrix of 
the chosen subset be nonsingular and so that each component that was not chosen 
be equal (with probability one) to a deterministic affine function of the chosen 
components. We then manipulate only the chosen components and, with some 
deterministic bookkeeping "on the side," take care of the components that were 
not chosen. This idea is made precise in Corollary 23.4.3. 

To illustrate the idea, suppose that X is a zero-mean random vector of covariance 
matrix 



K 



xx 




An application of Proposition 23.4.1 ahead will show that because the three columns 
of Kxx satisfy the linear relationship 




it follows that 

-X (1) + 2AT (2) - AT (3) = 0, with probability one. 
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Consequently, in manipulating X we can pick the two components X^ 2 ' , X^ 3 ' , 
which are of nonsingular covariance matrix ( x 9 3 jj ) (obtained by picking the last 
two rows and the last two columns of Kxx), an d keep track "on the side" of the 
fact that X^ 1 ' is equal, with probability one, to 2X < - 2 ' — X^ 3 '. We could, of course, 
also pick the components X^^X^ 2 ' of nonsingular covariance matrix (59) and 
keep track "on the side" of the relationship X^ = 2X^ — X^\ 

To avoid cumbersome language, for the remainder of this section we shall take all 
equalities between random variables to stand for equalities with probability one. 
Thus, if we write X^ = 2X^ - X^ we mean that the probability that X^ is 
equal to 2X {2 ^ - X^ is one. 

The justification of the procedure is in the following proposition and its two corol- 
laries. 

Proposition 23.4.1. Let X be a zero-mean random n-vector of covariance ma- 
trix Kxx- Then its £-th component X^> is a deterministic linear combination of 
X^ 1 ' , . . . , X^ ri > if, and only if, the £-th column of Kxx is a linear combination of 
Columns t\, . . . , £ n . Here £,r],£i, ... ,£ v , € {1, . . . , n} are arbitrary. 

Proof. If £ € {£1, . . . ,£ v }, then the result is trivial. We shall therefore present a 
proof only for the case where £ (fc {£\, . . . , £ n }. In this case, the f-th component of 
the random n-vector X is a linear combination of the r\ components X^ 1 >, . . . , X^' 
if, and only if, there exists a vector a. G R n satisfying 

a (£) = -1, (23.31a) 

a ( K )=0, K<£{e,£ u ...,£ v }, (23.31b) 

and 

a T X = 0. (23.31c) 

Since X is of zero mean, the condition a T X = is equivalent to the condition 
Var[a T X] = 0. By (23.30) and Lemma 23.3.8 this latter condition is equivalent 
to the condition Kxx a = 0. Now Kxx a is a linear combination of the columns 
°f Kxx where the first column is multiplied by a*- 1 ', the second by cr- 2 \ etc. Con- 
sequently, the condition that Kxx a = for some a G R™ satisfying (23.31a) & 
(23.31b) is equivalent to the condition that the ^-th column of Kxx is a linear 
combination of Columns £\ , . . . , £ n . □ 

Corollary 23.4.2. The covariance matrix of a zero-mean random n-vector X is 
singular if, and only if, some component of X is a linear combination of the other 
components. 

Proof. Follows from Proposition 23.4.1 by noting that a square matrix is singular 
if, and only if, its columns are linearly dependent. □ 

Corollary 23.4.3. Let Xtea zero-mean random n-vector of covariance matrix Kxx- 
If Columns £i,.-.,£d °f Kxx form a basis for the subspace of R" spanned by the 
columns o/Kxx? then every component o/X can be written as a linear combination 
of the components X^ 1 ',. . . ,X^ d > , and the random d-vector (X^ 1 ',...,X^ d n 
has a nonsingular d x d covariance matrix. 
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Proof. Since Columns £\,...,£d form a basis for the subspace spanned by the 
columns of Kxx, every column £ can be written as a linear combination of these 
columns. Consequently, by Proposition 23.4.1, every component of X can be writ- 
ten as a linear combination of X 1 - 11 ' , . . . , X^ d > . To prove that the d x d covariance 
matrix K^ of the random d-vector X = (X^ 1 ',. . . ,X^ d n is nonsingular, we 
note that if this were not the case, then by Corollary 23.4.2 applied to X it would 
follow that one of the components of X is a linear combination of the other d — 1 
components. But by Proposition 23.4.1 applied to X, this would imply that the 
columns £\ 1 ...,£d of Kxx are not linearly independent, in contradiction to the 
corollary's hypothesis that they form a basis. □ 



23.4.4 The Characteristic Function 

If X is a random n-vector, then its characteristic function $x(') is a mapping 
from W 1 to C that maps each vector vj = {ru^ l \ . . . Tu^ n ') T in R™ to $x( ro, )i where 

iro T X 



/ n \ - 



= Eexp i^ ro V 
If X has the density /x (■) , then 

/OO f'OO 

■■ /x(x)e i ^?= 1 roW:Em dx^ • • • dx (n \ 

-oo J — OO 

which is reminiscent of the multi-dimensional Fourier Transform of /x(") (ignoring 
27r's and the sign of i). 

Proposition 23.4.4 (Identical Distributions and Characteristic Functions). Two 

random n-vectors X,Y are of the same distribution if, and only if, they have 
identical characteristic functions: 

(x = y) O ($x(w) = $yM, vj e R n ) . (23.32) 

Proof. See (Dudley, 2003, Chapter 9, Section 5, Theorem 9.5.1). □ 

This proposition is extremely useful. We shall demonstrate its power by using it 
to show that two random variables X and Y are independent if, and only if, 

E j e i( roi x +ro2 Y)l = E [ e i^iX] E [e iro2y ] , u7i,U7 2 eM. (23.33) 

One direction is straightforward. If X and Y are independent, then for any Borel 
measurable functions g(-) and h(-) the random variables g(X) and h(Y) are also 
independent. Thus, the independence of X and Y implies the independence of the 
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random variables e miX and e'^ 2Y and hence implies that the expectation of their 
product is the product of their expectations: 

E r e i(roix+ro 2 y)j = E r e i^iX e i ro2 yj 

= E [e miX ] E [e m2Y ] , vo 1 ,w 2 eR. 

As to the other direction, suppose that X' has the same law as X , that Y' has the 
same law as Y , and that X' and Y' are independent. Since X' has the same law 
as X, it follows that 



and similarly for Y' 



Ee IOlA =Ee"" A , roi€l, (23.34) 



E [e iro2F '] = E [e m2Y ] , ru 2 eR. (23.35) 

Consequently, since X' and Y' are independent 

E r e i(ro 1 x'+ ro2 y')l = e \e' WlX ' e'™ 2Y '] 



= E[e'^ x '\ E[e'^ Y '\ 

= E [e iroiX ] E [e^ 2Y ] , wi,w 2 el, 

where the third equality follows from (23.34) and (23.35). 

We thus see that if (23.33) holds, then the characteristic function of the vector 
(X, Y) T is identical to the characteristic function of the vector (X',Y') T . By 
Proposition 23.4.4 the joint distribution of (X, Y) must then be the same as the 
joint distribution of (X',Y'). Since according to the latter distribution the two 
components are independent, it follows that the same must be true according to 
the former, i.e., X and Y must be independent. 

23.5 A Standard Gaussian Vector 

Recall Definition 23.1.1 that a random n-vector W is a standard Gaussian if its n 
components are independent zero-mean unit-variance Gaussian random variables. 
Its density /w(") is then given by 

. . . ■ , M £) ) 2 

/w(w)= -=exp' 



e=1 \V2ir 



1 " 



(2tt)"/2^V 2 

l-"/2 p -§ 



= 1 



= (27r)- n ^e-5ll w H , weR". (23.36) 

The definition of a standard Gaussian random vector is an extension of the defi- 
nition of a standard Gaussian random variable: the sole component of a standard 
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one-dimensional Gaussian vector is a scalar A/"(0, 1) random variable. Conversely, 
every 7V(0, 1) random variable can be viewed as a one-dimensional standard Gaus- 
sian. 

If W is a standard Gaussian random n-vector then, as we next show, its mean 
vector and covariance matrix are given by 

E[W]=0, and Kww = In- (23.37) 

Indeed, the mean of a random vector is the vector of the means (23.22), so the 
fact that E[W] = is a consequence of all the components of W having zero 
mean. And using (23.24) it can be easily shown that the covariance matrix of W 
is the identity matrix because the components of W are independent and hence, a 
fortiori uncorrelated, and because they are each of unit variance. 

23.6 Gaussian Random Vectors 

Recall Definition 23.1.1 that a random n-vector X is said to be Gaussian if for some 
positive integer m there exists an n x m matrix A; a standard Gaussian random 
m-vector W; and a deterministic vector fj, € R n such that 

X = AW + /x. (23.38) 

From (23.38), from the second order properties of standard Gaussians (23.37), 
and from the behavior of the mean vector and covariance matrix under linear 
transformation (23.25a) & (23.27) we obtain 

X = AW + n and W standard) => ( E[X] = /x and Kxx = AA T ) . (23.39) 



Recall also that X is a centered Gaussian if X = AW for A and W as above. 

Every standard Gaussian vector is a centered Gaussian because every standard 
Gaussian n-vector W is equal to AW when A is the n x n identity matrix \ n . 
The reverse is not true: not every centered Gaussian is a standard Gaussian. 
Indeed, standard Gaussians have the identity covariance matrix (23.37), whereas 
the centered Gaussian vector AW has, by (23.39), the covariance matrix AA T , 
which need not be the identity matrix. 

Also, X is a Gaussian vector if, and only if, X — E[X] is a centered Gaussian 
because, by (23.39), 



X = AW + fi for some \i € R n and W standard Gaussian 

<S> (x = AW + E[X] and W standard Gaussian) 

O fx - E[X] = AW and W standard Gaussian) . (23.40) 






From (23.40) it also follows that the centered Gaussians are the Gaussian vectors 
of zero mean. 8 



8 Thus, the name "centered Gaussian," which we gave in Definition 23.1.1 was not misleading. 
A vector is a "centered Gaussian" if, and only if, it is Gaussian and centered. 
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Using the definition of a centered Gaussian and using (23.39) we can readily show 
that every positive semidefinite matrix is the covariance matrix of some centered 
Gaussian. In fact, more is true: 

Proposition 23.6.1 (Covariance Matrices and Positive Semidefinite Matrices). 

The covariance matrix of every finite-variance random vector is positive semidefi- 
nite, and every positive semidefinite matrix is the covariance matrix of some cen- 
tered Gaussian random vector. 

Proof. The covariance matrix of every random vector is positive semidefinite be- 
cause every covariance matrix is symmetric (23.29) and satisfies (23.28). We next 
establish the reverse. Given an n x n positive semidefinite matrix K we shall con- 
struct a centered Gaussian X whose covariance matrix Kxx is equal to K. We begin 
by noting that, since K is positive semidefinite, it follows from Proposition 23.3.6 
that there exists some n x n matrix S such that S T S = K. Let W be a standard 
Gaussian n-vector and consider the vector X = S T W. Being the result of a linear 
transformation of the standard Gaussian W, this vector is a centered Gaussian. We 
complete the proof by showing that its covariance matrix Kxx is the prespecified 
matrix K. This follows from the calculation 

Kxx = S T S 
= K, 

where the first equality follows from (23.39) (by substituting S T for A and for /x) 
and the second from our choice of S as satisfying S T S = K. □ 

23.6.1 Examples and Basic Properties 

In this section we provide some examples of Gaussian vectors and some simple 
properties that follow from their definition. 

(i) Every univariate 7V(/i,er 2 ) random variable, when viewed as a one dimen- 
sional random vector, is a Gaussian random vector. 

Proof: Such a univariate random variable has the same law as 
aW + fi, when W is a standard univariate Gaussian. 

(ii) Any deterministic vector is a Gaussian vector. 

Proof: Choose the matrix A as the all-zero matrix 0. 

(iii) If the components o/X are independent univariate Gaussians (not necessarily 
of equal variance), then X is a Gaussian vector. 

Proof: Choose A to be an appropriate diagonal matrix. 

For the purposes of stating the next proposition we remind the reader that the ran- 
dom vectors X = (X (1 \...,X( n ^) T and Y= (Y^\ . . . , y( n «)) T are independent 
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if, for every choice of £1 , . . . , £n x £ R an d Vi > • • • ; Vn € K, 

Pr [* (1) < £1, . . . , *<"-> < ^ , y« <»,!,..., y("») < ^J 

= Pr [* (1) < a, • • • , X^ < C„J Pr [y™ < Vl ,..., YM < ^J . 

The following proposition is a consequence of the fact that if Xi & X2 are inde- 
pendent, Xi = X' 1; X 2 = X' 2 , and X^ & X 2 are independent, then 

x 2 y ^x 2 

Proposition 23.6.2 (Stacking Independent Gaussian Vectors). Stacking two in- 
dependent Gaussian vectors one on top of the other results in a Gaussian vector. 

Proof. Let the random ^-vector Xi = (X[ , . . . , X[ n ) T be Gaussian, and let the 
random n 2 -vector X 2 = (X 2 , . . . , -X" 2 ) T be Gaussian and independent of Xi. 
We need to show that the (m + n 2 )- vector 

(l{ 1 »,...,l|"'Ui 1) ,..,^ 2, ) T (23.41) 

is Gaussian. 

Let the pair (Ai,/xi) represent Xi in the sense that Xi = A1W1 + /*i, where Ai 
is n\ x mi, fii £ R™ 1 , and Wi is a standard Gaussian mi-vector. Similarly, let 
the pair (A 2 ,/x 2 ) represent X 2 , where A 2 is n 2 x m 2 and /x 2 £ K" 2 . We next 
show that the vector (23.41) can be represented using the (n\ + n 2 ) x (mi + m 2 ) 
block-diagonal matrix A of diagonal components Ai and A 2 , and using the vector 
fi £ K"i+™2 that results when the vector fii is stacked on top of the vector /it 2 : 

Indeed, if W is a standard Gaussian (ni + n 2 )-vector and if we denote by Wj its 
first m components and by W 2 its last n 2 components, then the random vectors 
Wi and W 2 are independent, and each is a standard Gaussian. Consequently, 



AW + /i 



Ai \ W, / Ml 

a 2 ; ^w 2 ; + ^ 2 
A1W1+/X1 

A 2 W 2 + /x 2 

Xj 

X 2 

where the first equality follows from the definition of A and fi in (23.42); the 
second equality by computing the matrix product in blocks; and where the equality 
in distribution follows because the fact that Wj is a standard Gaussian implies 
that Xi = A1W1 + fii, the fact that W 2 is a standard Gaussian implies that 
X 2 = A 2 W 2 + /x 2 , and the fact that Wi and W 2 are independent implies that 
A1W1 + fii and A 2 W 2 + /x 2 are independent. D 
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Proposition 23.6.3 (An Affine Transformation of a Gaussian Is a Gaussian). 

Let X be a Gaussian n-vector. If C is a v x n matrix and if d £ W , then the 
random v-vector CX + d is Gaussian. 

Proof. If X = AW + [A, where A is a deterministic n x m matrix, W is a standard 
Gaussian m-vector, and fi € R n , then 

CX + d = C(AW + fj.) + d 

= (CA)W+(C/x + d), (23.43) 

which demonstrates that CX + d is Gaussian, because (CA) is a deterministic v x m 
matrix, W is a standard Gaussian m-vector, and C/x + d is a deterministic vector 
in W . □ 

This proposition has some important consequences. The first is that if we permute 
the components of a Gaussian vector then the resulting vector is also Gaussian. 
This explains why we sometimes say of random variables that they are jointly Gaus- 
sian without specifying an order. Indeed, by the following corollary, the Gaussianity 
of (X, Y, Z) T is equivalent to the Gaussianity of (Y, X, Z) T , etc. 

Corollary 23.6.4. Permuting the components of a Gaussian vector results in a 
Gaussian vector. 

Proof. Follows from Proposition 23.6.3 by choosing C to be the appropriate per- 
mutation matrix, i.e., the matrix that results from permuting the columns of the 
identity matrix. For example, 



□ 



Corollary 23.6.5 (Subsets of Jointly Gaussians Are Jointly Gaussian). Construct- 
ing a random p-vector from a Gaussian n-vector by picking p of its components 
(allowing for repetition) yields a Gaussian vector. 

Proof. Let X be a Gaussian n-vector. For any choice of ji, . . . ,j p G {1, . . . , n}, 
we can express the random p-vector (X^ 1 ^, . . . , X^ p >) as CX, where C is a deter- 
ministic p x n matrix whose Row-z^ Column-£ component is given by 

[C] v ,t = i{e = j„}. 

For example 

U (i v = 

The result thus follows from Proposition 23.6.3. □ 

Proposition 23.6.6. Each component of a Gaussian vector is a univariate Gaus- 
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Proof. Let X be a Gaussian n- vector, and let j G {1, ...,n} be arbitrary. We 
need to show that X' 1 ' is Gaussian. Since X is Gaussian, there exist annxm 
matrix A, a vector fi G R n , and a standard Gaussian W such that the vector X 
has the same law as the random vector AW + fi (Definition 23.1.1). In particular, 
the j-th component of X has the same law as the j'-th component of AW + /x, i.e., 

m 

X {j) |^ a (i,%W + /M U) ) j e {i,.. .,„}. 

The sum on the RHS is a linear combination of the independent univariate Gaus- 
sians W^ 1 ', . . . , W^ m ' and is thus, by Proposition 19.7.3, Gaussian. The result of 
adding /j"' is still Gaussian. □ 

We caution the reader that while each component of a Gaussian vector has a 
univariate Gaussian distribution, there exist random vectors that are not Gaussian 
and that yet have Gaussian components. 

23.6.2 The Mean and Covariance Determine the Law of a Gaussian 

From (23.39) it follows that if X = AW + /x, where W is a standard Gaussian, 
then /x must be equal to E[X]. Thus, the mean of X fully determines the vector /x. 
The matrix A, however, is not determined by the covariance of X. Indeed, by 
(23.39), the covariance matrix Kxx of X is equal to AA T , so Kxx only determines 
the product AA T . Since there are many different ways to express Kxx as the 
product of a matrix by its transpose, there are many choices of A (even of different 
dimensions) that result in AX + /x having the given covariance matrix. Prima 
facie, one might think that these different choices for A yield different Gaussian 
distributions. But this is not the case. In this section we shall show that, while the 
choice of A is not unique, all choices that result in AA T having the given covariance 
matrix Kxx give rise to the same distribution. 

We shall derive this result by computing the characteristic function <&x( - ) of a 
random n- vector X whose law is equal to the law of AW + /x, where W, A, and /x 
are as above and by then showing that <&x( - ) depends on A only via AA T , i.e., 
that $x('^ r ) can be computed for every vj G E™ from vj, AA T , and /x. Since, by 
(23.39), AA is equal to the covariance matrix Kxx of X, it will follow that the 
characteristic functions of all Gaussian vectors of a given mean vector and a given 
covariance matrix are identical. Since random vectors of identical characteristic 
functions must have identical distributions (Proposition 23.4.4), it will follow that 
all Gaussian vectors of a given mean vector and a given covariance matrix have 
identical distributions. 

We thus proceed to compute the characteristic function of a random n-vector X 
whose law is the law of AW + /x, where W is a standard Gaussian m-vector, A 
is n x m, and /x G E". By (23.39) it follows that Kxx = AA T . To that end we 
need to compute E[e lro x l for every vj G K™. From Proposition 23.6.3 (with the 
substitution of the lxn matrix vj t for C and of the scalar zero for d), it follows 
that -Cl7 T X is a Gaussian vector with only one component. By Proposition 23.6.6, 
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this sole component is a univariate Gaussian. Its mean is, by (23.25a), vj t jj, and 
its variance is, by (23.30), vj t Kxx^- Thus, 

ot t X~./V(VV, -n7 T Kxx^V roeR". (23.44) 



Using the expression (19.29) for the characteristic function of the univariate Gaus- 
sian distribution (with the substitution zu T fi for /!, the substitution vj t Kxx "w 
for it 2 , and the substitution 1 for zu), we obtain that the characteristic func- 
tion $x(')i which is defined as E[e ITO x ] , is given by 

$x(ro) = e-i™ T Kxx ro + iro V m e R n_ (23.45) 

Since this characteristic function is fully determined by the mean vector and the 
covariance matrix of X, it follows that the distribution is also determined by the 
mean and covariance. We have thus proved: 

Theorem 23.6.7 (The Mean and Covariance of a Gaussian Determine its Law). 

Two Gaussian vectors of equal mean vectors and of equal covariance matrices have 
identical distributions. 

Note 23.6.8. Theorem 23.6.7 and Proposition 23.6.1 combine to prove that for 
every \i E K™ and every n x n positive semidefinite matrix K there exists one, and 
only one, Gaussian distribution of mean fi and covariance matrix K. We denote 
this Gaussian distribution by A/"(/x, K). 

By (23.45) it follows that if X ~ Af(n, K) then 

(23.46) 




Theorem 23.6.7 has important consequences, one of which has to do with the 
properties of independence and uncorrelatedness. Recall that any two independent 
random variables (of finite mean) are also uncorrelated. The reverse is not in 
general true. But for jointly Gaussians it is: if X and Y are jointly Gaussian, then 
X and Y are independent if, and only if, they are uncorrelated. More generally: 

Corollary 23.6.9. Let X fed centered Gaussian (m + n^-vector. Let the random 
ni-vector Xi = (X^\ . . . ,X < - ni ^) T correspond to its first n\ components, and let 
X2 = {X^ Ul+l \ . . . ,l'" 1+ " 2 ') T correspond to the rest of its components. Then the 
vectors X x and X 2 are independent if, and only if, they are uncorrelated, i.e., if, 
and only if, 

E[XiXj] =0. (23.47) 

Proof. The easy direction, which has nothing to do with Gaussianity, is that if 
Xi and X2 are centered and independent, then (23.47) holds. Indeed, by the 
independence and the fact that the vectors are of zero mean we have 

E[XiXj] =E[X 1 ]E[X 2 r ] 

= E[X 1 ](E[X 2 ]) T 

= 00 T 

= 0. 
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We now prove the reverse using the Gaussianity. We begin by expressing the 
covariance matrix of X in terms of the covariance matrices of Xi and X2 as 



Kxx 



X2X-L 



XiX 2 
X2X2 



K XlXl 



,. (23.48) 

U ^x 2 x 2 

where the second equality follows from (23.47). 

Next, let X^ and X 2 be independent random vectors such that X' x = Xi and 
X 2 = X2. Let X' be the (n\ + ri2)-vector that results from stacking X' x on top 
of X 2 . Since X is Gaussian, it follows from Corollary 23.6.5 that X]^ must also 
be Gaussian, and since X^ has the same law as Xi, it too is Gaussian. Similarly, 
X 2 is also Gaussian. And since X' x and X' 2 are, by construction, independent, it 
follows from Proposition 23.6.2 that X' is a centered Gaussian. 

Having established that X' is Gaussian, we next compute its covariance matrix. 
Since, by construction, X^ and X 2 are independent and centered, 



K 



X'X' 



Kxpq 
K x , x , 

V K X ° 2 J> (-.49) 

where the second equality follows because the equality in law between X^ and Xi 
implies that Kx'x' = KxiXi and similarly for X' 2 . 

Comparing (23.49) and (23.48) we conclude that X and X' are centered Gaussians 
of identical covariance matrices. Consequently, by Theorem 23.6.7, X' = X. And 
since the first n\ components of X' are independent of its last ri2 components, the 
same must also be true for X. □ 

Corollary 23.6.10. If the components of the Gaussian random vector X are uncor- 
rected and the matrix Kxx is therefore diagonal, then the components of X are 
independent. 

Proof. By repeated application of Corollary 23.6.9. □ 

Another consequence of the fact that there is only one multivariate Gaussian distri- 
bution of a given mean vector and of a given covariance matrix has to do with pair- 
wise independence and independence. Recall that the random variables X\, . . . , X n 
are pairwise independent if for each pair of distinct indices 1/ ,v" G {1, . . . , n) 
the random variables X v i and X v n are independent, i.e., if for all such v l ' ,v" and 
all &/,&,« eR 

Pr[AV < &, X v „ < £„,,] = Pr[AV < £„'] Pr[X„„ < ^„]. (23.50) 

The random variables Xi, . . . , X n are independent if for all £1, . . . , £ n in R 

n 

Pr[Xj < tj, for all j e {1, . . . ,n}} = ]J Pr[Xj < ^] . (23.51) 
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Independence implies pairwise independence, but the two are not equivalent. One 
can find triplets of random variables that are pairwise independent but not inde- 
pendent. 9 But if Xi, . . . ,X n are jointly Gaussian, then pairwise independence is 
equivalent to independence: 

Corollary 23.6.11. // the components of a Gaussian random vector are pairwise 
independent, then they are independent. 

Proof. If the components of the Gaussian n- vector X are pairwise independent, 
then they are pairwise uncorrelated and the covariance matrix Kxx must be diag- 
onal. Denote the diagonal elements by Ai, . . . , A„. Let /x be the mean vector of X. 
Another Gaussian vector of this mean and of this covariance matrix is the Gaussian 
vector whose components are independent Af[fi^K Xj). Since the mean and covari- 
ance determine the distribution of Gaussian vectors, it follows that the two vectors, 
in fact, have identical laws so the components of X are also independent. □ 

Corollary 23.6.12. If W is a standard Gaussian n-vector, and if U is an n x n 

orthogonal matrix, then UW is also a standard Gaussian vector. 

Proof. By Definition 23.1.1 it follows that the random vector UW is a centered 
Gaussian. By (23.39) we obtain that the orthogonality of the matrix U implies 
that the covariance matrix of this centered Gaussian is the identity matrix, which 
is also the covariance matrix of W; see (23.37). Consequently, UW and W are 
two centered Gaussian vectors of identical covariance matrices and hence, by The- 
orem 23.6.7, of equal law. Since W is standard, this implies that UW must also 
be standard. □ 

The next corollary shows that if X is a centered Gaussian n-vector, then X = AW 
for a standard Gaussian n-vector W and some square matrix A. That is, if the 
law of an n-vector X is equal to the law of AW where A is an n x m matrix and 
where W is a standard Gaussian m-vector, then the law of X is also identical to 
the law of AW, where A is some nx n matrix and where W is a standard Gaussian 
n-vector. Consequently, we could have required in Definition 23.1.1 that the matrix 
A be square without changing the set of distributions that we define as Gaussian. 

Corollary 23.6.13. If X is a centered Gaussian n-vector, then there exists a de- 
terministic square n x n matrix A such that X = AW, where W is a standard 
Gaussian n-vector. 

Proof. Let Kxx denote the covariance matrix of X. Being a covariance matrix, 
Kxx must be positive semidefinite (Proposition 23.6.1). Consequently by Propo- 
sition 23.3.7, there exists some n x n matrix S such that 

Kxx = S T S. (23.52) 

Consider now the centered Gaussian S T W, where W is a standard Gaussian n- 
vector. By (23.39), the covariance matrix of S T W is S T S, which by (23.52) is 



9 A classical example is the triple X, Y, Z where X and Y are IID each taking on the values 
±1 equiprobably and where Z is their product. 
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equal to Kxx- Thus X and S T W are centered Gaussians of the same covariance, 
and so they must be of the same law. We have thus established that the law 
of X is the same as the law of the product of a square matrix (S T ) by a standard 
Gaussian (W). □ 

23.6.3 A Canonical Representation of a Centered Gaussian 

The representation of a centered Gaussian vector as the result of the multiplication 
of a deterministic matrix by a standard Gaussian vector is not unique. Indeed, 
whenever the n x m matrix A satisfies AA T = K it follows that if W is a standard 
Gaussian to- vector, then AW ~ 7V(0, K). (This follows because AW is a random 
n-vector of covariance matrix AA T (23.39); it is, by Definition 23.1.1, a centered 
Gaussian; and all centered Gaussians of a given covariance matrix have the same 
law.) We saw in Corollary 23.6.13 that A can always be chosen as a square matrix. 
Thus, to every K >^ there exists a square matrix A such that AW ~ A/"(0, K). In 
this section we shall focus on a particular choice of the matrix A that is useful in 
the analysis of Gaussian vectors. In this representation A is a square matrix that 
can be written as the product of an orthogonal matrix by a diagonal matrix. The 
diagonal matrix acts on W by stretching and shrinking its components, and the 
orthogonal matrix then rotates (and possibly reflects) the result. 

Theorem 23.6.14 (A Canonical Representation of a Gaussian Vector). Lei X be 

a centered Gaussian n-vector of covariance matrix Kxx- Then 

X = UA 1/2 W, 

where W is a standard Gaussian n-vector; the n x n matrix U is orthogonal; the 
nxn matrix A is diagonal; the diagonal elements of A are the eigenvalues of Kxx.; 
and the j-th column of U is an eigenvector corresponding to the eigenvalue of Kxx 
that is equal to the j-th diagonal element o/A. 

Proof. By Proposition 23.6.1, Kxx is positive semidefinite and a fortiori sym- 
metric. Consequently, by Proposition 23.3.6, there exists a diagonal matrix A 
whose diagonal elements are the eigenvalues of Kxx and there exists an orthogo- 
nal matrix U such that Kxx U = UA, so the j-th column of U is an eigenvector 
corresponding to the eigenvalue given by the j-th diagonal element of A. Since, 
Kxx (2 0, it follows that all its eigenvalues are nonnegative, and we can define the 
matrix A 1 ' 2 as the matrix whose components are the componentwise nonnegative 
square roots of the matrix A. As in (23.21), choose S = A X ' 2 [J T . We then have 
that Kxx = S T S. If W is a standard Gaussian, then S T W is a centered Gaussian 
of zero mean and covariance S T S. Since S T S = Kxx and since there is only one 
centered multivariate Gaussian distribution of a given covariance matrix, it follows 
that the law of S T W ( = UA 1 / 2 W) is the same as the law of X. □ 

Corollary 23.6.15. A centered Gaussian vector can be expressed as the result of 
an orthogonal transformation applied to a random vector whose components are 
independent centered univariate Gaussians of different variances. These variances 
are the eigenvalues of the covariance matrix. 
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Figure 23.1: Contour plot of the density of four different two dimensional Gaussian 
random variables: from left to right and top to bottom Xi, . . . , X4. 



Proof. Because the matrix A in the theorem is diagonal, we can write A 1 ' 2 W as 



A V2 W 



fjzww , 



where Ai, . . . , A„ are the diagonal elements of A, i.e., the eigenvalues of Kxx- Thus, 
the random vector A 1 ' 2 W has independent components with the i/-th component 
being J\f(0,\ v ). D 



Figures 23.1 and 23.2 demonstrate this canonical representation. They depict the 
contour lines and mesh plots of the density functions of the following four two- 
dimensional Gaussian vectors: 
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Figure 23.2: Mesh plots of the density functions of Gaussian random vectors: from 
left to right and top to down Xi, . . . , X4. 
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where W is a standard Gaussian vector with two components. 

Theorem 23.6.14 can be used to find a linear transformation that transforms a 
given Gaussian vector to a standard Gaussian. The following is the multivariate 
version of the univariate result showing that if X ~ M\u, (J 2 ) , where a 2 > 0, then 
{X - /i)/er has a A/"(0, 1) distribution (19.8). 

Proposition 23.6.16 (From Gaussians to Standard Gaussians). Let the random 
n-vector X be jV(/z, K), where K >- and fi G K™. Let the n x n matrices A and U 
be such that A is diagonal, U is orthogonal, and KU = UA. Then 

A- 1 /2 U T (X- M )~^(0,I„), 

where A ' is £/ie diagonal matrix whose diagonal entries are the reciprocals of the 
square roots of the diagonal elements of A. 

Proof. Since an affine transformation of a Gaussian vector is Gaussian (Proposi- 
tion 23.6.3), it follows that A _1 ' 2 U T (X — ft) is a Gaussian vector. And since the 
mean and covariance of a Gaussian vector fully specify its law (Theorem 23.6.7), 
the result will follow once we show that the mean of A _1 / 2 U T (X — //) is the zero 
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vector and its covariance matrix is the identity matrix. This can be readily verified 
using (23.39). □ 



23.6.4 The Density of a Gaussian Vector 

As we saw in Corollary 23.4.2, if the covariance matrix of a centered vector is 
singular, then at least one of its components can be expressed as a deterministic 
linear combination of its other components. Consequently, random vectors with 
singular covariance matrices cannot have a density. If the covariance matrix is 
nonsingular, then the vector may or may not have a density. If it is Gaussian, then 
it does. In this section we shall derive the density of the multivariate Gaussian 
distribution when the covariance matrix is nonsingular. 

We begin with the centered case. To derive the density of a centered Gaussian 
n- vector of positive definite covariance matrix K we shall use Theorem 23.6.14 to 
represent the jV(0, K) distribution as the distribution of UA^W where U is an 
orthogonal matrix and A is a diagonal matrix satisfying KU = UA. Note that A 
is nonsingular because its diagonal elements are the eigenvalues of K, which we 
assume to be positive definite. 

Let 

B=UA 1 / 2 , (23.53) 

so the density we are after is the density of BW. Note that, by (23.53), 

BB T = UA 1/2 A 1/2 U T 
= UAU T 
= K. (23.54) 

Also, by (23.54), 



|det(B)| = Vdet(B)det(B) 



det(B)det(B T ) 



det(BB T ) 
= v/det(K), (23.55) 

where the first equality follows by expressing \x\ as Vx 2 ; the second follows because 
a square matrix and its transpose have the same determinant; the third because the 
determinant of the product of square matrices is the product of the determinants; 
and where the last equality follows from (23.54). 

Using the formula for computing the density of BW from that of W (Theo- 
rem 17.3.4), we have that if X = BW, then 

/w(B _1 x) 

/x(x) = |det(B)| 
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_exp(-I(B-ix) T (B-ix)) 



(27r)"/2|det(B)| 
exp(-ix T (B- 1 ) T (B- 1 


-)) 


(27r)"/ 2 |det(B)| 
exp(-ix T (BB T ) _1 x) 




(27r)™/ 2 |det(B)| 
expj-ixTK-ix) 




(27r)"/ 2 |det(B)| 
exp(-ix T K- 1 x) 





(2 7 r)™/2 v /det(K) 

where the second equality follows from the density of the standard Gaussian (23.36); 
the third from the rule for the transpose of the product of matrices (23.3); the fourth 
from the representation of the inverse of the product of matrices as the product 
of the inverses in reverse order (AB) _1 = B _1 A _1 and because transposition and 
inversion commute; the fifth from (23.54); and the sixth from (23.55). It follows 
that if X ~ jV(0, K) where K is nonsingular, then 

exp(-ix T K- 1 x) 
/x(x) = —^4 2 ' , xel". 

v /(2 7 r)™det(K) 

Accounting for the mean, we have that if X ~ A/" (/a, K) where K is nonsingular, 
then 



jx(x) = = , x e 

y(27r)"det(K) 



(23.56) 



23.6.5 Linear Functionals of Gaussian Vectors 

A linear functional on R" is a linear mapping from R" to R. For example, if a 
is any fixed vector in R ra , then the mapping 

x i-> a T x (23.57) 

is a linear functional on R". In fact, as we next show, every linear functional on R™ 
has this form. This can be proved by using linearity to verify that we can choose 
the j-th component of a to equal the result of applying the linear functional to 
the vector e^ whose components are all zero except for its j-th component which 
is equal to one. 

If X is a Gaussian n-vector and if ex G R™, then, by Proposition 23.6.3 (applied 
with the substitution of the 1 x n matrix a T for C) , it follows that ct T X is a Gaus- 
sian vector with only one component. By Proposition 23.6.6, this sole component 
must have a univariate Gaussian distribution. We thus conclude that the result of 
applying a linear functional to a Gaussian vector is a Gaussian random variable. 
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We next show that the reverse is also true: if X is of mean fj, and of covariance 
matrix K and if the result of applying every linear functional to X has a univari- 
ate Gaussian distribution, then X ~ A/"(/x, K). 10 To prove this result we compute 
the characteristic function of X. For every vj € R™ the mapping x \— » -ccr T x is 
a linear functional on W l . Consequently, our assumption that the result of the 
application of every linear functional to X has a univariate Gaussian distribution 
implies (23.44). From here we can follow the steps leading to (23.46) to conclude 
that the characteristic function of X must be given by the RHS of (23.46). Since 
this is also the characteristic function of a jVQu, K) random vector, it follows that 
X ~ -<V(/x, K), because random vectors of identical characteristic functions must 
have identical distributions (Proposition 23.4.4). We have thus proved: 

Theorem 23.6.17 (Gaussian Vectors and Linear Functionals). A random vector^ 
is Gaussian if, and only if, every linear functional o/X has a univariate Gaussian 
distribution. 



23.7 Jointly Gaussian Vectors 

Three miracles occur when we compute the conditional distribution of X given 

Y = y for jointly Gaussian random vectors X and Y. Before describing these 
miracles we need to define jointly Gaussian vectors. 

Definition 23.7.1 (Jointly Gaussian Vectors). Two random vectors are said to be 
jointly Gaussian if the vector that results when one is stacked on top of the other 
is Gaussian. 

That is, the random n^-vector X = (X^ 1 ' , . . . , X^ nx ') T and the random n^-vector 

Y = (Y^ 1 ' , . . . , y("»)) T are jointly Gaussian if the random (n x + n y )-vector 



'jf' 1 ',...,^"-'.^ 1 ! yM 



is Gaussian. 



By Corollary 23.6.5, the random vectors X and Y can only be jointly Gaussian if 
each is Gaussian. But this is not enough: both X and Y can be Gaussian without 
them being jointly Gaussian. However, if X and Y are independent Gaussian 
vectors, then, by Proposition 23.6.2, they are jointly Gaussian. 

Proposition 23.7.2. Independent Gaussian vectors are jointly Gaussian. 

By Corollary 23.6.9 we have: 

Proposition 23.7.3. // two jointly Gaussian random vectors are uncorrelated, then 
they are independent. 



10 It is not difficult to show that the assumption that X is of finite variance is not necessary. 
If every linear functional of X is of finite variance, then X must be of finite variance. Thus, we 
could have stated the result as follows: if a random vector is such that the result of applying 
every linear functional to it is a univariate Gaussian, then it is a multivariate Gaussian. 
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Having denned jointly Gaussian random vectors we next turn to the main result of 
this section. Loosely speaking, it states that if X and Y are jointly Gaussian, then 
in computing the conditional distribution of X given Y = y three miracles occur: 

(i) the conditional distribution is a multivariate Gaussian; 
(ii) its mean vector is an affine function of y; 
(hi) and its covariance matrix does not depend on y. 

Before stating this more formally, we justify two simplifying assumptions. The first 
assumption is that the covariance matrix of Y is nonsingular, so 

Kyy >- 0. 

The reason is that if the covariance matrix of Y is singular, then, by Corol- 
lary 23.4.2, some of its components are with probability one affine functions of 
the others, and we then have to consider two cases. If the realization y satisfies 
these affine relations, then we can just pick a subset of the components of Y that 
determine all the other components and that have a nonsingular covariance matrix 
as in Section 23.4.3 and ignore the other components of y; the ignored components 
do not alter the conditional distribution of X given Y = y. The other case where 
the realization y does not satisfy the relations that Y satisfies with probability one 
can be ignored because it occurs with probability zero. 

The second assumption we make is that both X and Y are centered. There is no 
loss in generality in making this assumption for the following reason. Conditioning 
on Y = y when Y has mean fj, y is equivalent to conditioning on Y — fj, y = y — fi y . 
And if X has mean fx x , then we can compute the conditional distribution of X by 
computing the conditional distribution of X — fi x and by then shifting the resulting 
distribution by fi x . Thus, the conditional density /x|Y=y(0 is given by 

/x|Y=y(x) = jx-^IY-^y-^X - fl x ) , (23.58) 

where X — fj, x & Y — fi y are jointly Gaussian and centered whenever X & Y are 
jointly Gaussian. It is now straightforward to verify that if the miracles hold for 
the centered case 

x ^ /x- Mai |Y-A( H =y-^( x ) 

then they also hold for the general case 

x ^ /x-^|Y-M„=y-M B ( X_ /O- 

Theorem 23.7.4. Let X and Y be centered and jointly Gaussian with covariance 
matrices Kxx and Kyy- Assume that Kyy >~ 0. Then the conditional distribution 
of X conditional on Y = y is a multivariate Gaussian of mean 

E [XY T ] Kyy y (23.59) 

and covariance matrix 

Kxx-E[XY T ] Kyy E[YX t ] . (23.60) 
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Proof. Let n x and n y denote the number of components of X and Y. Let D be 

any deterministic real n x x n y matrix. Then clearly 

X=DY+(X-DY). (23.61) 

Since X and Y are jointly Gaussian, the vector (X T , Y T ) is Gaussian. Conse- 
quently, since 

it follows from Proposition 23.6.3 that 

(X — DY) and Y are centered and jointly Gaussian. (23.62) 

Suppose now that the matrix D is chosen so that (X — DY) and Y be uncorrelated: 

E[(X- DY)Y T ] =0. (23.63) 

By (23.62) and Proposition 23.7.3 it then follows that the random vector (X— DY) 
is independent of Y. Consequently, with this choice of D we have that (23.61) 
expresses X as the sum of two terms where the first, DY, is fully determined 
by Y and where the second, (X — DY), is independent of Y. It follows that 
the conditional distribution of X given Y = y is the same as the distribution of 
(X - DY) but shifted by Dy. By (23.62) and Corollary 23.6.5, (X - DY) is a 
centered Gaussian, so the conditional distribution of X given Y = y is that of the 
centered Gaussian (X — DY) shifted by the vector Dy. This already establishes 
the three "miracles" we discussed before: the conditional distribution of X given 
Y = y is Gaussian; its mean Dy is a linear function of Y; and its covariance matrix, 
which is the covariance matrix of (X — DY), does not depend on the realization y 
of Y. 

The remaining claims, namely that the mean of the conditional distribution is as 
given in (23.59) and that the covariance matrix is as given in (23.60) now follow 
from straightforward calculations. Indeed, by solving (23.63) for D we obtain 

D= E[XY T ] Kyy, (23.64) 

so Dy is given by (23.59). To show that the covariance of the conditional law of X 
given Y = y is as given in (23.60), we note that this covariance is the covariance 
of (X — DY), which is given by 

E[(X- DY)(X- DY) T ] = E[(X- DY)X T ] - E [(X - DY)(DY) T ] 

= E [(X - DY)X T ] - E [(X - DY) Y T ] D T 
= E[(X- DY)X T ] 
= Kxx-DE[YX T ] 

= Kxx-E[XY T ] C E[YX T ] , 

where the first equality follows by opening the second set of parentheses; the second 
by (23.3) and (23.25b); the third by (23.63); the fourth by opening the parentheses 
and using the linearity of the expectation; and the final equality by (23.64). □ 
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Theorem 23.7 .4 has important consequences in Estimation Theory. A key result 
in Estimation Theory is that if after observing that Y = y for some y € K" y we 
would like to estimate the random n x - vector X using a (Borel measurable) function 
g : R™» — > W 1 " so as to minimize the estimation error 

E[||X- 5 (Y)|| 2 ], (23.65) 

then an optimal choice for g(-) is the conditional expectation 

<?(y) = E[x|Y = y], yeR"». (23.66) 

Theorem (23.7.4) demonstrates that if X and Y are jointly Gaussian and centered, 
then E[X| Y = y] is a linear function of y and is explicitly given by (23.59). Thus, 
for jointly Gaussian centered random vectors, there is no loss in optimality in 
limiting ourselves to linear estimators. 

The optimality of choosing g{) as in (23.66) has a simple intuitive explanation. We 
first note that it suffices to establish the result when n x = 1, i.e., when estimating 
a random variable rather than a random vector. Indeed, the squared-norm error in 
estimating a random vector X with n x components is the sum of the squared errors 
in estimating its components. To minimize the sum, one should therefore minimize 
each of the terms. And the problem of estimating the j'-th component of X based 
on the observation Y = y is a problem of estimating a random variable. Stated 
differently, to estimate X so as to minimize the error (23.65) we should separately 
estimate each of its components. 

Having established that it suffices to prove the optimality of (23.66) when n x = 1, 
we now assume that n x = 1 and denote the random variable to be estimated by X . 
To study how to estimate X after observing that Y = y, we first consider the 
case where there is no observation. In this case, the estimate is a constant, and 
by Lemma 14.4.1 the optimal choice of that constant is the mean E[A]. We now 
view the general case where we observe Y = y as though there were no observables 
but X had the a posteriori distribution given Y = y. Utilizing the result for the 
case where there are no observables yields that estimating X by E [X | Y = y] is 
optimal. 

23.8 Moments and Wick's Formula 

We next describe without proof a technique for computing moments of centered 
Gaussian vectors. A sketch of a proof can be found in (Zvonkin, 1997). 

Theorem 23.8.1 (Wick's Formula). Let X be a centered Gaussian n-vector and 
let gi,...,g2fe: R n — » R be an even number of (not necessarily different) linear 
functionals on W 1 . Then 

E[ 5l (X)ff 2 (X)... ff2fe (X)] 

= Y, E b Pl ( X ) 9 qi (X)] E[g P2 (X) g q2 (X)] • • • E[g Pk (X) g qk (X)] , (23.67) 

where the summation is over all permutations pi, q\,P2, Q2, • • • ,Pk, Qk of 1,2, ... , 2k 
such that 

Pi <p 2 < ■■■ <Pk (23.68a) 
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Pi < <7i, P2 < q_i, ■■ ■ , Pk < qk- (23.68b) 

The number of terms on the RHS of (23.67) is 1 x 3 x 5 x • • • x (2k — 1). 

Example 23.8.2. Suppose that n = 1, SO X is a centered univariate Gaussian. 
Let cr 2 be its variance, and suppose we wish to compute E \X A \ . We can express this 
in the form of Theorem 23.8.1 with k = 2 and gi(x) = g 2 (x) = 33(2:) = g^(x) = x. 
By Wick's Formula 

E[X 4 } = E[ gi (X)g 2 (X)}E[g 3 (X)g 4 (X)} + E[ gi (X)g 3 (X)}E[g 2 (X)g 4 (X)} 
+ E[g 1 (X)g i (X)}E[g 2 (X)g 3 (X)} 
= 3a 4 , 

which is in agreement with (19.31). 

Example 23.8.3. Suppose that X is a bivariate centered Gaussian whose com- 
ponents are of unit variance and of correlation coefficient p € [—1,1]. We com- 
pute E[(XW) 2 (X( 2 )) 2 ] using Theorem 23.8.1 by setting k = 2 and by defining 
<7i(x) = 32(x) = X*- 1 ) and 33(x) = g4(x) = x 1 - 2 ' . By Wick's Formula 

e[(x«) 2 (x( 2 )) 2 ] 

= E[ 9l (X) 5 2 (X)] E[0a(X) 9 4 (X)] + E[ 5 i(X) 9 3 (X)] E[ ff2 (X) ff 4 (X)] 
+ E[ 5l (X)ff 4 (X)]E[ 52 (X)ff 3 (X)] 

= e[(x^) 2 ] e[(x< 2 )) 2 ] + e[x^x^] e[x^x^] 

+ E [ X (i) X (2)] E J x (i) x (2)" 

= l + 2p 2 . (23.69) 

Similarly, 

E[(X«) 3 X( 2 )] =3p. (23.70) 

23.9 The Limit of Gaussian Vectors Is a Gaussian Vector 

The results of Section 19.9 on limits of Gaussian random variables extend to Gaus- 
sian vectors. In this setting we consider random vectors X,Xi,X 2 ,... defined 
over the probability space (il,^F, P). We say that the sequence of random vectors 
Xi,X 2 , . . . converges to the random vector X with probability one or almost 
surely if 

Pr({w G n : lim X„(w) = X(w)}) = 1. (23.71) 

The sequence Xi,X 2 , . . . converges to the random vector X in probability if 

lim Pr[||X„-X|| > el = 0, e > 0. (23.72) 

n — >oo 

The sequence Xi,X 2 , . . . converges to the random vector X in mean square if 

lim E [||X„ - X|| 2 1 =0. (23.73) 
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Finally, the sequence of random vectors Xi,X2, . . . taking value in K d converges 
to the random vector X weakly or in distribution if 

lim Prpd 1 ) < £«, . . . , X^ < e (d) l = Pr[X« < £«, . . . ,X™ < £ (d) l (23.74) 

n — !-oo 

for every vector £ G M. d such that 

limPr[X« < C (1) -e,..., X<-*> < e (d) - el 

ej.0 L J 

= Pr[X< 1 > <^\...,XW <f(<*)]. (23.75) 

In analogy to Theorem 19.9.1 we next show that, irrespective of which of the above 
forms of convergence we consider, if a sequence of Gaussian vectors converges to 
some random vector X, then X must be Gaussian. 

Theorem 23.9.1. Let the random d-vectors X, Xi,X2, . . . be defined over a com- 
mon probability space. Let X!,X 2 ,... each be Gaussian (with possibly different 
mean vectors and covariance matrices). Lf the sequence Xi,X2, . . . converges to X 
in the sense of (23.71) or (23.72) or (23.73), then X must be Gaussian. 

Proof. The proof is based on Theorem 23.6.17, which demonstrates that it suffices 
to consider linear functionals of the vectors in the sequence and on the analogous 
result for scalars (Theorem 19.9.1). We demonstrate the idea by considering the 
case where the convergence is almost sure. If Xi,X2, . . . converges almost surely 
to X, then for every a G K d the sequence a T Xj, a T X2, . . . converges almost surely 
to a T X. Since, by Theorem 23.6.17, linear functionals of Gaussian vectors are 
univariate Gaussians, it follows that the sequence a T Xi, a T X2, ... is a sequence of 
Gaussian random variables. And since it converges almost surely to q t X, it follows 
from Theorem 19.9.1 that a T X must be Gaussian. Since this is true for every a 
in M. d , it follows from Theorem 23.6.17 that X must be a Gaussian vector. □ 



In analogy to Theorem 19.9.2 we have the following result on weakly converging 
Gaussian vectors. 

Theorem 23.9.2 (Weakly Converging Gaussian Vectors). Let the sequence of 
random d-vectors Xi,X2, . . . be such that X n ~ Af(fi n , K„) for n = 1,2,... Then 
the sequence converges in distribution to some limiting distribution, if and only if, 
there exist some fj, G K d and some d x d matrix K such that 

H n — ► /x and K n — > K. (23.76) 

And if the sequence does converge in distribution, then it converges to the multi- 
variate Gaussian distribution of mean vector fi and covariance matrix K. 



Proof. See (Gikhman and Skorokhod, 1996, Chapter I, Section 3, Theorem 4). □ 
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23.10 Additional Reading 

There are numerous books on Matrix Theory that discuss orthogonal matrices and 
positive semidefinite matrices. We mention here (Zhang, 1999, Section 5.2), (Her- 
stein, 2001, Chapter 6, Section 6.10), and (Axler, 1997, Chapter 7) on orthogonal 
matrices, and (Zhang, 1999, Chapter 6), (Axler, 1997, Chapter 7), and (Horn and 
Johnson, 1985, Chapter 7) on positive semidefinite matrices. Much more on the 
multivariate Gaussian can be found in (Tong, 1990) and (Johnson and Kotz, 1972, 
Chapter 35). For more on estimation and linear estimation, see Poor (1994) and 
(Kailath, Sayed, and Hassibi, 2000). 

23.11 Exercises 

Exercise 23.1 (Covariance Matrices). Which of the following matrices cannot be a co- 
variance matrix of some real random vector? 

*-G A). -G ;)• -a ?)■ -a I' 



Exercise 23.2 (An Orthogonal Matrix of Determinant 1). Show that in Theorem 23.6.14 
the orthogonal matrix U can be chosen to have determinant +1. 

Exercise 23.3 (A Mixture of Gaussians). Let X ~ Af(fJ, x ,cr^.) and Y ~ N \n y , dy) be 

Gaussian random variables. Let E take on the values and 1 equiprobably and indepen- 
dently of (X, Y). Define the mixture RV 

',X if£ = 0, 
Z= { 

Y ilE = l. 

Must Z be Gaussian? Can Z be Gaussian? Compute Z's characteristic function. 

Exercise 23.4 (Multivariate Gaussians). Show that if Z is a univariate Gaussian, then 
the random vector (Z, Z) is a Gaussian vector. What is its canonical representation? 

Exercise 23.5 (Manipulating Gaussians). Let W\, Wi, . . . , W5 be IID Af(0, 1). Define 

Y = 3Wi + 4W 2 - 2W 3 + W 4 -W 5 and Z = Wi- 4W 2 - 2W 3 + 3W 4 - W 5 . What is the 
joint distribution of (Y, Z)1 

Exercise 23.6 (Largest Eigenvalue). Let X be a zero-mean Gaussian n-vector of covari- 
ance matrix K y 0, and let A ma x denote the maximal eigenvalue of K. Show that for some 
random n- vector Z independent of X 

X + Z~A/"(0,A max l„), 
where l n denotes the n x n identity matrix. 
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Exercise 23.7 (The Error Probability Revisited). Show that p* (correct) of (21.68) in 
Problem 21.5 can be rewritten as 

p* (correct) = — exp f — — -^-J Elexp (- max{E (m) }J I , 

where (E* 1 ', . . . ; h' m ') t is a centered Gaussian with a covariance matrix whose Row-j 
Column-^ entry is {sj,se) E . 

Exercise 23.8 (Gaussian Marginals). Let X and Z be IID Af(0, 1). Let Y = \Z\ sgn(X), 
where sgn(X) is 1 if X > and is —1 otherwise. Show that X is Gaussian, that Y is 
Gaussian, but that they are not jointly Gaussian. Sketch the contour lines of their joint 
probability density function. 

Exercise 23.9 (Characteristic Function of a Random Vector). Let X be a random vector 
with two components whose characteristic function is $x( - )- Express the characteristic 
function of the sum of its components in terms of 3?x( - )- 

Exercise 23.10 (The Distribution of Linear Functionals). Let X and Y be random n- 
vectors of components X' 1 -*, . . . , X^ n ' and Y^ 1 ' , . . . , Y { • K Assume that for all determinis- 
tic coefficients ai,...,a n £R the random variables X)"=i ot v X^ v ' and X)"=i ol v Y^"' have 
the same distribution, i.e., 



3=1 3=1 



{3) ' '<*!,. 



(i) Show that the characteristic function of X must be equal to that of Y. 
(ii) Show that X and Y must have the same distribution. 

Exercise 23.11 (Independence, Uncorrelatedness and Gaussianity). Let the random vari- 
ables X and H be independent with X ~ 7V(0, 1) and with H taking on the values ±1 
equiprobably. Let Y = HX denote their product. 

(i) Find the density of Y . 

(ii) Are X and Y correlated? 
(iii) Compute Pr[|A| > l] and Pr[|Y| > l] . 
(iv) Compute the probability that both \X\ and \Y\ exceed 1. 

(v) Are X and Y independent? 
(vi) Is the vector (X, Y) T a Gaussian vector? 

Exercise 23.12 (Expected Maximum of Jointly Gaussians). 

(i) Let (Xi,X2, • ■ ■ , X n , Y) have an arbitrary joint distribution with E[Y] = 0. Here 
Y need not be independent of (Ai, A2, . . . , X n ). Prove that 

El" max {Xi + Y\] = eI" max {X,-}1 . 

Ll<3'<" J Ll<j<n l '1 

(ii) Use Part (i) to prove that if (U, V) are jointly Gaussian and of zero mean, then 



^mvr^fM^pi 
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Exercise 23.13 (The Density of a Bivariate Gaussian). Let X and Y be jointly Gaussian 
with means p x and p y and with positive variances a x and a y . Let 

Cov[X,F] 
P= 



be their correlation coefficient. Assume \p\ < 1. 

(i) Find the joint density of X and Y. 
(ii) Find the conditional density of X given Y — y. 

Exercise 23.14 (A Training Symbol). Conditional on [X\,X2) = (0:1,0:2), the observable 
(Y 1 ,Y 2 ) is given by 

Y v =Ax v + Z v , v = \,2, 

where Zi, Z 2 , and A are independent with Z\,Z 2 ~ IID Af(0,a 2 ) and A ~ Af(0, 1). 
Suppose that Xi = 1 (deterministically) and that X2 takes on the values ±1 equiprobably. 

(i) Derive an optimal rule for guessing X2 based on (Y\,Y2). 

(ii) Consider a decoder that operates in two stages. In the first stage the decoder 
estimates A from Y\ with an estimator that minimizes the mean squared-error. 
In the second stage it uses the ML decoding rule for guessing X2 based on Y2 
by pretending that A is given by its estimate from the first stage. Compute the 
probability of error of this decoder. Is it optimal? 

Exercise 23.15 (On Wick's Formula). Let X be a centered Gaussian n-vector, and let 
gi, . . . , g2fe+i : R n — > M be an odd number of (not necessarily different) linear functionals 
from 1" to R. Show that 

E[ 5l (X) ff2 (X) ...pa fc+ i(X)] =0. 

Exercise 23.16 (Jointly Gaussians with Positive Correlation). Let X and Y be jointly 
Gaussian with means p, x and \x y \ positive variances a x and a y ; and correlation coefficient p 
as in Exercise 23.13 satisfying \p\ < 1. 

(i) Show that, conditional on Y — y, the distribution of X is Gaussian with mean 
p- x + p — (y — p y ) and variance a x (l — p ). 

(ii) Show that if p > 0, then the family fx\y{x\y) has the monotone likelihood ratio 
property that the mapping 

^ fx\v(x\y) 
fx\Y(x\y') 

is nondecreasing whenever y < y. Here fx\y{'\y) is the conditional density of X 
given Y — y. 

(iii) Show that if p > 0, then the joint density fx,y(-) has the Total Positivity of 
Order 2 (TP 2 ) property, i.e., 

fx,Y(oo' ,y) f X]Y {x,y') < f x , Y (x,y) f XY (x' ,y), \x' < x, y < yj . 
See (Tong, 1990, Chapter 4, Section 4.3.1, Fact 4.3.1 and Theorem 4.3.1). 
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Exercise 23.17 (Price's Theorem). Let X be a centered Gaussian n-vector of covariance 
matrix A. Let X u ' £) = E[X U) X <1) ] be the Row-j Column-^ entry of A. Let /x(x;A) 
denote the density of X (when A is nonsingular). 

(i) Expressing the FT of the partial derivative of a function in terms of the FT of the 
original function and using the characteristic function of a Gaussian (23.46), derive 
Plackett's Identities 

9/x(x;A) 1 9 2 / x (x;A) 9/x(x;A) d 2 / x (x;A) 

d\U,J) 2 d(xU)) 2 ' d\UJ) dxU)dxW ' J ^ ' 

(ii) Using integration by parts, derive Price's Theorem: if h: R n — » R is twice con- 
tinuously differentiable with h and its first and second derivatives growing at most 
polynomially in ||x|| as ||x|| — » oo, then 

(See (Adler, 1990, Chapter 2, Section 2.2) for the case where A is singular.) 

(iii) Show that if in addition to the assumptions of Part (ii) we also assume that for 
some j t^ I 

then E[/i(X)] is a nondecreasing function of A"'*- 1 . 

(iv) Conclude that if /i(x) = n"=i Qv{x ), where for each v £ {1, . . . ,n} the function 
g„ : R — > R is nonnegative, nondecreasing, twice continuously differentiable, and 
satisfying the growth conditions of h in Part (ii), then 

n 

n*(* M ) 

is monotonically nondecreasing in A whenever j 7^ (,. 
(v) By choosing <7k(-) to approximate the step function a 1— » l{a > C } f° r properly 
chosen C , prove Slepian's Inequality: if X ~ 7V(/x, A), then for every choice of 

e (1) ,..,f w e» 

Pr[jf (1) >{ (1) ,..,X w >{ w ] 

is monotonically nondecreasing in A^'*- 1 whenever j 7^ £. See (Tong, 1990, Chap- 
ter 5, Section 5.1.4, Theorem 5.1.7). 

(vi) Modify the arguments in Parts (iv) and (v) to show that if X ~ M(p,, A), then for 
every choice of £' 1 ' , . . . , £} n > £ R 



Pr 



[x w <e\...,x (n) <i (n) ] 



is monotonically nondecreasing in X-^' 1 ' whenever j 7^ (.. See (Adler, 1990, Chap- 
ter 2, Section 2.2, Corollary 2.4). 

Exercise 23.18 (Jointly Gaussians of Equal Sign). Let X and Y be jointly Gaussian and 
centered with positive variances and correlation coefficient p. Prove that 

Pr[XY>0]^ 1 - + ^, 

where —tt/2 < <\> < 7r/2 is such that sin^ = p. We propose the following approach. 
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(i) Show that it suffices to prove the result when X and Y are of unit variance, 
(ii) Show that, for such X and Y, if we define 

W = , 1 X -. P Y, Z = Y, 

then W and Z are IID Af(0, 1). 
(iii) Show that X and Y can be expressed as 

X = Rsm(0 + 4>), Y = RcosO, 

where <f> is as defined before, is uniformly distributed over the interval [— n,n), 
R is independent of 0, and fn(r) = re~ r ' 2 l{r > 0}. 
(iv) Justify the calculation 

Pr[XY > 0] = 2Pr[X > 0, Y > 0] 

= 2 Pr [sin(0 + <j>) > 0, cos > 0] 

2 7T 

Hint: Exercise 19.7 may be useful for Part (iii). 



Chapter 24 

Complex Gaussians and Circular Symmetry 

24.1 Introduction 

This chapter introduces the complex Gaussian distribution and the circular sym- 
metry property. We start with the scalar case and then extend these notions to 
random vectors. We rely heavily on Chapter 17 for the basic properties of complex 
random variables and on Chapter 23 for the properties of the multivariate Gaussian 
distribution. 

24.2 Scalars 

24.2.1 Standard Complex Gaussians 

Definition 24.2.1 (Standard Complex Gaussian). A standard complex Gaus- 
sian is a complex random variable whose real and imaginary parts are independent 
A/"(0, 1/2) random variables. 

If W is a standard complex Gaussian, then its density is given by 

f w (w)=-e-M\ weC, (24.1) 

because 

fw{w) = /Ro(M/)jm(wo(R-e(w),Im(w)) 
= fne(w) (Re(w)) f Im{w) (lm(u>)) 

= _}_ e -Rc(w) 2 J_ £ -Im(u0 2 

= -e-l t °l 2 , weC, 

where the first equality follows from the definition of the density fw{ w ) of a 
CRV W at w G C as the joint density fn e (w),im(w) of its real and imaginary 
parts (Re(W), Im(W)) evaluated at (Re(w),Im(w)) (Section 17.3.1); the second 

494 
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because the real and imaginary parts of a standard complex Gaussian are indepen- 
dent; the third because the real and imaginary parts of a standard Gaussian are 
zero-mean variance-1/2 real Gaussians whose density can thus be computed from 
(19.6) (by substituting 1/2 for a 2 ); and where the final equality follows because for 
any complex number w we have Re(u>) 2 + Im(iy) 2 = |w| 2 . 

Because the real and imaginary parts of a standard complex Gaussian W are of 
zero mean, it follows that 

E[W] = E[Re(W)] + i E[Im(TT)] 
= 0. 

And because they are each of variance 1/2, it follows from (17.14c) that a standard 
complex Gaussian W has unit-variance 

Var[VF] = E[|T^| 2 ] = 1. (24.2) 

Moreover, since a standard complex Gaussian is of zero mean and since its real 
and imaginary parts are of equal variance and uncorrelated, a standard Gaussian 
is proper (Definition 17.3.1 and Proposition 17.3.2), i.e., 

E[W]=0 and E[W 2 ]=0. (24.3) 

Finally note that, by (24.1), the density fw{) of a standard complex Gaussian 
is radially-symmetric, i.e., its value at w £ C depends on w only via its mod- 
ulus \w\. A CRV whose density is radially-symmetric is said to be circularly- 
symmetric, but the definition of circular symmetry applies also to complex ran- 
dom variables that do not have a density. This is the topic of the next section. 

24.2.2 Circular Symmetry 

Definition 24.2.2 (Circularly-Symmetric CRV). A CRV Z is said to be circularly- 
symmetric if for any deterministic <f> £ [— tt,tt) the distribution ofe'^Z is identical 
to the distribution of Z : 

e ,4> Z = Z, <£e[-7r,7r). (24.4) 

Note 24.2.3. If the expectation of a circularly-symmetric CRV is defined, then it 
must be zero. 

Proof. Let Z be circularly-symmetric. It then follows from (24.4) that e^Z and Z 
are of equal expectation, so 

E\Z\ = E\e^Z\ 

= e ! *E[Z], <£e[-7r,7r), 

which, by considering a <f> for which e 1 ^ 7^ 1, implies that E[Z] must be zero. □ 

To shed some light on the definition of circular symmetry we shall need Proposi- 
tion 24.2.5 ahead, which is highly intuitive but a bit cumbersome to state. Before 
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stating it we provide its discrete counterpart, which is a bit easier to state: it 
makes formal the intuition that if after giving the wheel-of-fortune an arbitrary 
spin, you give it another fair spin, then the combined result is a fair spin that does 
not depend on the initial spin. The case r\ = 2 is critical in cryptography. It shows 
that taking the mod-2 sum of a binary source sequence with a sequence of IID 
random bits results in a sequence that is independent of the source sequence. 

Proposition 24.2.4. Fix a positive integer rj, and define the set A = {0, . . . , rj — 1}. 
Let N be a RV taking value in the set A. Then the following statements are 
equivalent: 

(a) The RV N is uniformly distributed over the set A. 

(b) For any integer-valued RV K that is independent of N , the RV (N+K) mod r\ 
is independent of K and uniformly distributed over A. 1 

Proof. We first show (b) => (a). To this end, define K to be a RV that takes on 
the value zero deterministically. Being deterministic, it is independent of every 
RV, and in particular of N . Statement (b) thus guarantees that (N + 0) mod rj is 
uniformly distributed over A. Since we have assumed from the outset that N takes 
value in A, it follows that (N + 0) mod rj = N , so the uniformity of (N + 0) mod rj 
over A implies the uniformity of N over A. 

We next show (a) => (b). To this end, we need to show that if N is uniformly 
distributed over A and if K is independent of N, then 2 

PrU(N + K) mod rj) = a K = k\ = -, (keZ,aeA). (24.5) 

By the independence of TV and K it follows that 

Pr\((N + K) mod?/) = a K = k\ = Pr \({N + k) mod rj) = a\ , (k e Z, a e A) , 

so to prove (24.5) it suffices to prove 

PrU(N + k) mod rj) =a\= -, (keZ, a e a). (24.6) 

This can be proved as follows. Because N is uniformly distributed over A, it follows 
that N+k is uniformly distributed over the set {k, fc+1, . . . , k+rj—l}. And, because 
the mapping m \— > (m mod if) is a one-to-one mapping from {fc,fc+l,...,fc + r/— 1} 
onto A, this implies that (N + k) mod rj is also uniformly distributed over A, thus 
establishing (24.6). □ 

Proposition 24.2.5. Let be a RV taking value in [— 7r,7r). Then the following 
statements are equivalent: 



1 Here m mod y is the remainder of dividing m by y, i.e., the unique v £ A such that m — v 
is an integer multiple of y. E.g. 17 mod 8 = 1. 

2 Recall that the random variables X and Y are independent if, and only if, the conditional 
distribution of X given Y is equal to the marginal distribution of X. 
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Figure 24.1: The function £ i— > (£ mod [— tt, +tt)) plotted for £ € [- 



cf>,ir- 



(a) The RV Q is uniformly distributed over [— tt,tt). 

(b) For any real RV $ that is independent of Q, the RV (0 + $) mod [— tt,tt) is 
independent of $ and uniformly distributed over the interval [— tt,tt). 3 

Proof. The proof is similar to the proof of Proposition 24.2.4 but with an added 
twist. The twist is needed because if X has a uniform density and if a function g 
is one-to-one (injective) and onto (surjective), then g(X) need not be uniformly 
distributed. (For example, if X ~ U ([0, 1]) and if g: [0, 1] — * [0, 1] maps £ to £ 2 , 
then g(X) is not uniform.) 

To prove that (b) implies (a) we simply apply (b) to the deterministic RV $ = 0. 

We next prove that (a) implies (b). As in the discrete case, it suffices to show that 
if G is uniformly distributed over [— 7r,7r), then for any deterministic </> g R the 
distribution of (0 + 4>) mod [— tt, tt) is uniform over [— tt, tt), irrespective of </>. To 
this end we first note that because is uniform over [— tt, w) it follows that © + </> is 
uniform over [<f> — tt, (f> + ir). Consider now the mapping g : [4> — n, 4> + 71") — > [— n, n) 
defined by g: £ i— > (£ mod [— 7T, 7r)). This function is a one-to-one mapping onto 
[— 7r,7r) and is differentiable except at the point £* € [0 — n, <ft + tt) satisfying 
£* mod [— 7T, 7r) = n, i.e., the point £* G [0 — 7r, + tt) of the form £* = 27rm + 7r for 
some integer m. At all other points its derivative is 1; see Figure 24.1. (Incidentally, 
— tt + (/> is mapped to a negative number if <f) < £* and to a positive number if 
4> > £*. In Figure 24.1 we assume the latter.) Applying the formula for computing 
the density of g(X) from the density of X (Theorem 17.3.4) we find that if + cf> 
is uniform over [(f) — tt, (f> + tt), then g(<p + 0) is uniform over [—tt, tt). □ 

With the aid of Proposition 24.2.5 we can now give alternative characterizations 
of circular symmetry. 



3 Here x mod [— it, n) is the unique £ € [— n, n) such that x — £ is an integer multiple of 2n. 
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Proposition 24.2.6 (Characterizing Circular Symmetry). Let Z be a CRV with 
a density. Then each of the following statements is equivalent to the statement 
that Z is circularly-symmetric: 

(a) The distribution of e'^Z is identical to the distribution of Z, for any deter- 
ministic (j> € [— 71", 71"). 

(b) The CRV Z has a radially- symmetric density function, i.e., a density fz{ m ) 
whose value at z depends on z only via its modulus \z\. 

(c) The CRV Z can be written as Z = Re'® , where R > and are independent 
real random variables and ~ 11 ([— 7r, tt)). 

Proof. Statement (a) is the definition of circular symmetry (Definition 24.2.2). 

The proof of (a) => (b) is slightly obtuse because the density of a CRV is not 
unique. 4 We begin by noting that if Z is of density fz{), then by (17.34) the 
CRV e'^Z is of density w i-» f z {e~'^w). Thus, if Z = e"t>Z and if Z is of density 
fz('), then Z is also of density w t— > fz(s~"^w). Consequently, if Z is circularly- 
symmetric, then for every <fi G [— 7r,7r) the mapping w i— > /z(e _l ^w) is a density 
for Z. We can therefore conclude that the mapping 

! r f z (e-''*w)d<l> 



2ir 

is also a density for Z, and this function is radially-symmetric. 

The fact that (b) =>■ (c) follows because if we define R to be the magnitude of Z 
and to be its argument, then Z = Re'®, and 

f R ,e{r,6) = rf z {re' e ) 

= rfz(r) 

= (27rr/ z (r))i-, 

where the first equality follows from (17.29) and the second from our assumption 
that fz(z) depends on z only via its modulus \z\. The joint density of R, is thus 
of a product form, thereby indicating that R and are independent. And it does 
not depend on 6, thus indicating that its marginal is uniformly distributed. 

We finally show that (c) => (a). To that end we assume that R > and are 
independent with being uniformly distributed over [—it, tt) and proceed to show 
that Re'® is circularly-symmetric, i.e., that 

Re' & = Re^+rt, <f>e[-TT,TT). (24.7) 

To prove (24.7) we note that 

e i(e+0) _ e i((e+0) mod [-• 7r,7r)) 

= e ie , (24.8) 



4 And not all the functions that are densities for a given circularly-symmetric CRV Z are 
radially-symmetric. The radial symmetry can be broken on a set of Lebesgue measure zero. We 
can therefore only claim that there exists "a" radially-symmetric density function for Z. 
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where the first equality follows from the periodicity of the complex exponentials, 
and where the equality in distribution follows from Proposition 24.2.5 because 
~ U ([— 7r, 7r)). The proof is now completed by noting that (24.7) follows from 
(24.8) and from the independence of R and 0. (If X is independent of Y, if X is 
independent of Z , and if Y = Z, then (X, Y) = (X, Z) and hence XY = XZ .) □ 

Example 24.2.7. Let the CRV Z be given by Z = e i<& , where $ ~ W([-7r,7r)). 
Then Z is uniformly distributed over the unit circle {z : \z\ = 1} and is circularly- 
symmetric. It does not have a density. 

24.2.3 Properness and Circular Symmetry 

Proposition 24.2.8. Every finite-variance circularly-symmetric CRV is proper. 

Proof. Let Z be a finite-variance circularly-symmetric CRV. By Note 24.2.3 it 
follows that E[Z] = 0. To conclude the proof it remains to show that E[Z 2 \ = 0. 
To this end we note that 

E[Z 2 } =e- i2 *E[(e i *Z) 

= e- i2 *E[Z 2 ], <f>€[-ir,ir), (24.9) 

where the first equality follows by rewriting Z 2 as e~ l2( ^ (e'^Z) , and where the 
second equality follows because the circular symmetry of Z guarantees that Z 
and e'^Z have the same law, so the expectation of their squares must be equal. 
But (24.9) cannot be satisfied for all <fi € [— 7r, it) (or for that matter for any (f> such 
that e' 2 * ^ 1) unless E [Z 2 ] =0. □ 

Note 24.2.9. Not every proper CRV is circularly-symmetric. 

Proof. Consider the CRV Z that takes on the four values 1 + i, 1 — i, —1 + i, and 
— 1 — i equiprobably. Its real and imaginary parts are independent, each taking on 
the values ±1 equiprobably. Computing E[Z] and E [Z 2 \ we find that they are both 
zero, so Z is proper. To see that Z is not circularly-symmetric consider the random 
variable e l7r ' 4 Z. Its distribution is different from the distribution of Z because Z 
takes value in the set {1 + i, —1 + i, 1 — i, — 1 — i}, and e l7r ' 4 Z takes value in the 
rotated set { y/2, - a/2, y/2\ , - \/2\} . □ 

The fact that not every proper CRV is circularly-symmetric is not surprising be- 
cause whether a CRV is proper or not is determined solely by its mean and by the 
covariance matrix of its real and imaginary parts, whereas circular symmetry has 
to do with the entire distribution. 



24.2.4 Complex Gaussians 

The definition of a complex Gaussian builds on the definition of a real Gaussian 
vector (Definition 23.1.1). 
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Definition 24.2.10 (Complex Gaussian). A complex Gaussian is a CRV whose 
real and imaginary parts are jointly Gaussian real random variables. A centered 
complex Gaussian is a complex Gaussian of zero mean. 

An example of a complex Gaussian is the standard complex Gaussian, which we 
encountered in Section 24.2.1. 

The class of complex Gaussians is closed under multiplication by deterministic 
complex numbers. Thus, if Z is a complex Gaussian and if a 6 C is deterministic, 
then aZ is also a complex Gaussian. Indeed, 

Re{aZ)\ _ /Re(a) -Im(a)\ (Re(Z)\ 

Im{aZ)J ~ \lm{a) Re (a) ) \ Im (^)/ ' 

so the claim follows from the fact that multiplying a real Gaussian vector by a 
deterministic real matrix results in a real Gaussian vector (Proposition 23.6.3). 
We leave it to the reader to verify that, more generally, if Z is a complex Gaussian 
and if a, /3 € C are deterministic, then aZ + (3Z* is also a complex Gaussian. (This 
is a special case of Proposition 24.3.9 ahead.) 

Not every centered complex Gaussian can be expressed as the scaling of a standard 
complex Gaussian by some complex number. But the following result characterizes 
those that can: 

Proposition 24.2.11. 

(i) For every centered complex Gaussian Z we can find coefficients a, (3 G C so 
that 

Z = aW + (3W*, (24.10) 

where W is a standard complex Gaussian. 

(ii) A centered complex Gaussian Z is proper if, and only if, there exists some 
a G C such that Z = aW , where W is a standard complex Gaussian. 

Proof. We begin with Part (i). First note that since Z is a complex Gaussian, its 
real and imaginary parts are jointly Gaussian, and it follows from Corollary 23.6.13 
that there exist deterministic real numbers cr 1,1 ', cr 1 ' 2 ',^ 2 ' 1 ', a*- 2,2 ' such that 

{lm(Z)J ~ [ a W apn) \W 2 ) ' ( ' 

where W\ and Wi are independent real standard Gaussians. Next note that by 
direct computation 



Re(aW + (3W*)\ = / ^ 73 \ /V2Re(l^)\ 

lm(aW + /3W*)J I Im(/3)+Im(q) Rc(q)-Rc(/3) I \ y /2lm{W) ) 

Since, by the definition of a standard complex Gaussian W , 



(24.12) 



W 2 ) ~ {y/21m(W)J { ' 
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it follows from (24.11), (24.12), and (24.13) that if a and (3 are chosen so that 

/ Rc(g)+Re(/3) Im(/3)-Im(g) \ , {ll) (1 2) 

I Im(/i)+Im(a) Rc(g)-Rc(/3) I = L(2,l) ffl (2,2) 
\ \/2 \/2 / 



i.e., if 



then 



_L(( (l.l) +0 (2,2)) +i ( (2,l)_ 0(1,2))^ 

i = (( a (l,l)_ a (2,2)) + i ( a (2,l) +a (l,2)^ 



/Re(Z)\ * /Re(aW + /3VF* 
Vlm(Z) i ~ [lm{aW + f3W* 



and (24.10) is satisfied. 

We next turn to Part (ii). One direction is straightforward: if Z = aW , then Z 
must be proper because from (24.3) it follows that E[aVF] = aE[VF] = and 
E[(aW) 2 ]=a 2 E[W 2 ]=0. 

We next prove the other direction that if Z is a proper complex Gaussian, then 
Z = aW for some a € C and some standard complex Gaussian W. Let Z be a 
proper complex Gaussian. By Part (i) it follows that there exist a, f3 G C such that 
(24.10) is satisfied. Consequently, for this choice of a and (3 we have 

0= E[Z 2 ] 
= E[(aW + f3W*) 2 } 

= o?E [W 2 ] + 2af3E[WW*} + f3 2 E [(W*) 2 ] 
= 2a/3, 

where the first equality follows because Z is proper; the second because a and (3 
have been chosen so that (24.10) holds; the third by opening the brackets and using 
the linearity of expectation; and the fourth by (24.3) and (24.2). It follows that 
either a or (3 must be zero. Since W = W* , there is no loss in generality in assuming 
that (3 = 0, thus establishing the existence of a G C such that Z = aW. □ 

By Proposition 24.2.11 (ii) we conclude that if Z is a proper complex Gaussian, then 
Z = aW for some a G C and some standard complex Gaussian W . Consequently, 
the density of such a CRV Z (that is not deterministically zero) is given by 

fwjz/a) 

JZ(Z) = 7—r^ 



Z G 



7r|ct| : 



where the first equality follows from the way the density of a CRV behaves under 
linear transformations (Theorem 17.3.7 or Lemma 17.4.6), and where the second 
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equality follows from (24.1). We thus conclude that if Z is a proper complex Gaus- 
sian, then its density is radially-symmetric, and Z must be circularly-symmetric. 
The reverse is also true: since every complex Gaussian is of finite variance, and since 
every finite- variance circularly-symmetric CRV is also proper (Proposition 24.2.8), 
we conclude that every circularly-symmetric complex Gaussian is proper. Thus: 

Proposition 24.2.12. A complex Gaussian is circularly-symmetric if, and only if, 
it is proper. 

The picture that thus emerges is the following. 

(i) Every finite- variance circularly- symmetric CRV is proper, 
(ii) Some proper CRVs are not circularly symmetric. 
(iii) A Gaussian CRV is circularly- symmetric, if and only if, it is proper. 

We shall soon see that these observations extend to vectors too. In fact, the reader 
is encouraged to consult Figure 24.2 on Page 508, which holds also for CRVs. 

24.3 Vectors 

24.3.1 Standard Complex Gaussian Vectors 

Definition 24.3.1 (Standard Complex Gaussian Vector). A standard complex 

Gaussian vector is a complex random vector whose components are IID and each 
of them is a standard complex Gaussian random variable. 

If W is a standard complex Gaussian n- vector, then, by the independence of its n 
components and by (24.1), its density is given by 

/w(w) = \ e~ wtw . weC". (24.14) 

71"" 

By the independence of its components and by (24.3) 

E[W]=0 and E[WW T ]=0. (24.15) 

Thus, every standard complex Gaussian vector is proper (Section 17.4.2). By the 
independence of the components and by (24.2) it also follows that 

E[WW+] = I„, (24.16) 

where we remind the reader that l„ denotes the n x n identity matrix. 

24.3.2 Circularly-Symmetric Complex Random Vectors 

Definition 24.3.2 (Circularly-Symmetric Complex Random Vectors). We say that 
the complex random vector 7i is circularly- symmetric if for every (f) G [— 7r,7r) 
the law of e"*Z is identical to the law ofL. 
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An equivalent definition can be given in terms of linear functionals: 

Proposition 24.3.3 (Circular Symmetry and Linear Functionals). Each of the fol- 
lowing statements is equivalent to the statement that the complex random n-vector Z 
is circularly- symmetric. 

(a) For every <f> G [— n, n) the law of the complex random vector e'^Z is the same 
as the law of Z : 

e i0 Z = Z, <£e[-7r,7r). (24.17) 

(b) For every deterministic vector a. € C™, the CRVa T Z is circularly-symmetric: 

e i0 a T Z = a: T Z, (a € C", <f> € [-w,ir)). (24.18) 



Proof. Statement (a) is just the definition of circular symmetry. We next show 
that the two statements (a) and (b) are equivalent. We begin by proving that (a) 
implies (b). This is the easy part because applying the same linear functional to 
two random vectors that have the same law results in random variables that have 
the same law. Consequently, (24.17) implies (24.18). 

We now prove that (b) implies (a). We thus assume (24.18) and set out to prove 
(24.17). By Theorem 17.4.4 it follows that to establish (24.17) it suffices to show 
that the random vectors on the RHS and LHS of (24.17) have the same character- 
istic function, i.e., that 



iRc(ro+ e i* z) 



e iRe( ro +Z) 



vj e C™. (24.19) 



But this readily follows from (24.18) because upon substituting vj^ for a T in 
(24.18) we obtain that 

-c^Z^-c^e^Z, zueC n , 

and this implies (24.19), because if Z\ = Z<i, then E[g(Zi)] = E[<?(Z2)] for any 
measurable function g and, in particular, for the function g: £ i— > e lRo ^- ) . □ 

The following proposition demonstrates that circular symmetry is preserved by 
linear transformations. 

Proposition 24.3.4 (Circular Symmetry and Linear Transformations). Let Z be a 

circularly- symmetric complex random n-vector and let A be a deterministic complex 
mxn matrix. Then the complex random m-vector AZ is also circularly-symmetric. 

Proof. By Proposition 24.3.3 it follows that to establish that AZ is circularly- 
symmetric it suffices to show that for every deterministic a € C m the random 
variable a AZ is circularly-symmetric. To show this, fix some arbitrary a G C m . 
Because Z is circularly-symmetric, it follows from Proposition 24.3.3 that for every 
deterministic vector (3 G C™, the random variable /3 T Z is circularly-symmetric. 
Choosing (3 = A 1 a establishes that q t AZ is circularly-symmetric. □ 
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24.3.3 Proper vs. Circularly-Symmetric Vectors 

We now extend the relationship between properness and circular symmetry to 
vectors: 

Proposition 24.3.5 (Circular Symmetry Implies Properness). 

(i) Every finite-variance circularly-symmetric random vector is proper. 
(ii) Some proper random vectors are not circularly-symmetric. 

Proof. Part (ii) requires no proof because a CRV can be viewed as a complex 
random vector taking value in C , and we have already seen in Section 24.2.3 an 
example of a CRV which is proper but not circularly-symmetric (Note 24.2.9). 

We now prove Part (i). Let Z be a finite- variance circularly-symmetric random 
n-vector. To establish that Z is proper we will show that for every a. G C" the 
CRV a T Z is proper (Proposition 17.4.2). To this end, fix an arbitrary ex € C". 
By Proposition 24.3.3 it follows that the CRV ex J Z is circularly-symmetric. And 
because Z is of finite variance, so is a T Z. Being a circularly-symmetric CRV of 
finite variance, it follows from Section 24.2.3 that e* T Z must be proper. □ 

24.3.4 Complex Gaussian Vectors 

Definition 24.3.6 (Complex Gaussian Vectors). A complex random n-vector Z is 
said to be a complex Gaussian vector if the real random In-vector 

Re(Z (1) ), . . . ,Re(Z (n) ),Im(Z (1) ), . . . ,Im(Z (n) )) T (24.20) 

consisting of the real and imaginary parts of its components is a real Gaussian 
vector. A centered complex Gaussian vector is a zero-mean complex Gaussian 
vector. 

Note that, Theorem 23.6.7 notwithstanding, the distribution of a centered complex 
Gaussian vector is not uniquely specified by its covariance matrix. It is uniquely 
specified by the covariance matrix if the Gaussian vector is additionally known to 
be proper. This is a direct consequence of the following proposition. 

Proposition 24.3.7. The distribution of a centered complex Gaussian vector Z is 
uniquely specified by the matrices 

K=E[ZZ f ] and L=E[ZZ T ]. 

Proof. Let R be the real 2n- vector that results from stacking the real part of Z on 
top of its imaginary part as in (24.20). We will prove the proposition by showing 
that the matrices K and L uniquely specify the distribution of R. 

Since Z is a complex Gaussian n-vector, R is a real Gaussian 2n-vector. Since Z is 
of zero mean, so is R. Consequently, the distribution of R is fully characterized by 
its covariance matrix E [RR T ] (Theorem 23.6.7). The proof will thus be concluded 
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once we show that the matrices L and K determine the covariance matrix of R. 
Indeed, as we next verify, 



, T1 _l/Re(K) + Re(L) Im(L) - Im(K)\ 
■L J 2 Um(L)+Im(K) Re(K) - Re(LW 



(24.21) 



To verify (24.21) one needs to compute each of the block entries separately. We 
shall see how this is done by computing the top-right entry. The rest of the entries 
are left for the reader to verify. 



E Re(Z)Im(Z) T 



2 

'Z + Z* 



2i 

Z T - zt 



2 / V 2i 

1 /E[ZZ T ] - E[Z*Zt] E[ZZt] - E[Z*Z T ] 



2i 



2i 



(lm(L) -Im(K)). 



D 



Corollary 24.3.8. The distribution of a proper complex Gaussian vector is uniquely 
specified by its covariance matrix. 

Proof. Follows from Proposition 24.3.7 by noting that by specifying that a complex 
Gaussian is proper we are specifying that the matrix L is zero (Definition 17.4.1). 

□ 

Proposition 24.3.9 (Linear Transformations of Complex Gaussians). If Z is a 

complex Gaussian n-vector and if A and B are deterministic m x n complex ma- 
trices, then the m-vector 

AZ+ BZ* 

is a complex Gaussian. 

Proof. Define the complex random m-vector C = AZ + BZ*. To prove that C is 
Gaussian we recall that linearly transforming a real Gaussian vector yields a real 
Gaussian vector (Proposition 23.6.3), and we note that the real random 2m- vector 
whose components are the real and imaginary parts of C can be expressed as the 
result of applying a linear transformation to the real Gaussian 2n-vector whose 
components are the real and imaginary parts of the components of Z: 



Re(C; 

im(c; 



/Re(A) 
\lm(A) 



Re(B) 
Im(B) 



Im(B) 
Re(A) 



Im(A)\ 
Re(B) J 



Re(Z; 

im(z; 



□ 



Proposition 24.3.10 (Characterizing Complex Gaussian Vectors). Each of the 
following statements is equivalent to the statement that Z is a complex Gaussian 
n-vector. 



(a) The real random vector whose 2n components correspond to the real and 
imaginary parts of Z is a real Gaussian vector. 
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(b) For every deterministic vector a G C™, the CRV a T Z is a complex Gaussian 
random variable. 

(c) There exist complex n x m matrices A and B and a vector fi G C n such that 

Z = AW + BW* + fi 
for some standard complex Gaussian random m-vector W. 

Proof. Statement (a) is just the definition of a Gaussian complex random vector. 

We next prove the equivalence of (a) and (b). That (a) implies (b) follows from 
Proposition 24.3.9 (by substituting a T for A and for B). 

To prove that (b) => (a) it suffices (by Definition 24.3.6 and Theorem 23.6.17) to 
show that (b) implies that any real linear functional of the real random 2n-vector 
comprising the real and imaginary parts of Z is a real Gaussian random variable, 
i.e., that for every choice of the real constants a** 1 ', . . . , aS n > and /?W , . . . , /?(") the 
random variable 

n n 






; a {i) Re(Z^) + Y, P U) Im(Z^) (24.22) 

is a Gaussian real random variable. To that end we rewrite (24.22) as 

n n 

\ a {i) Re(Z^) + Y, P (j) Im(Z^) = a T Re(Z) + /3 T Im(Z) (24.23) 

= Re((a- i/3) T Z), (24.24) 

where we define the real vectors a and (5 as a = (a' 1 ', . . . , cr")) T G M™ and 
/3 = (/3W, . . . , /3<™)) T G E n . Now (b) implies that (a - i/3) T Z is a Gaussian 
complex random variable, so its real part Re((cc — i/3) T Z) must be a real Gaus- 
sian random variable (Definition 24.2.10 and Proposition 23.6.6), thus establishing 
that (b) implies that (24.22) is a real Gaussian random variable. 

We next turn to proving the equivalence of (a) and (c). That (c) implies (a) follows 
directly from Proposition 24.3.9 applied to the Gaussian vector W. The proof of 
the implication (a) => (c) is very similar to the proof of its scalar version (24.10). 
We first note that since we can choose fi = E[Z], it suffices to prove the result for 
the centered case. Now (a) implies that there exist n x n matrices D, E, F, G such 
that 

(L?z!) s (p t)(Zi). 

where Wi and W2 are independent real standard Gaussian n-vectors (Defini- 
tion 23.1.1). On the other hand 

/-o /"AW , DW*A / Re(A)+Rc(B) Im(B)-Im(A) \ , r- s 

/Re(AW+BW*)\ _ / 73 73 \ /V2Re(W)\ 

I Im(AW + BW*)J ~ I Im(B)+Im(A) Rc(A)-Rc(B) I I ^ Im(W)J ' ( ' 

\ v2 v 2 / 

If W is a standard complex Gaussian, then 

/V2Re(W)\ * /Wi 

Vv^im(w)y " ^w 2 
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where Wi and W2 are as above. Consequently, the representations (24.25) and 
(24.26) agree if 

x / Rc(A)+Rc(B) Im(B)-Im(A) \ 



i.e., if we set 



p Q J — I Im(B)+Im(A) Rc(A)-Rc(B) 



A=-L((D+G) + i(F-E)), 

B=-L((D-G) + i(F+E)). D 

24.3.5 Proper Complex Gaussian Vectors 

A proper complex Gaussian vector is a complex Gaussian vector that is also proper 
(Definition 17.4.1). Thus, Z is a proper complex Gaussian vector if it is a centered 
complex Gaussian vector satisfying E[ZZ T 1 = 0. 

Recall that, by Proposition 24.3.5, every finite-variance circularly-symmetric com- 
plex random vector is also proper, but that some random vectors are proper and not 
circularly-symmetric. We next show that for Gaussian vectors, circular symmetry 
is equivalent to properness. 

Proposition 24.3.11 (For Complex Gaussians, Proper = Circularly-Symmetric). 

A complex Gaussian vector is proper if, and only if, it is circularly-symmetric. 

Proof. Every circularly-symmetric complex Gaussian is proper, because every com- 
plex Gaussian is of finite-variance, and every finite-variance circularly-symmetric 
complex random vector is proper (Proposition 24.3.5). 

We now turn to the reverse implication, i.e., that if a complex Gaussian vector 
is proper, then it is circularly-symmetric. Assume that Z is a proper Gaussian 
n-vector. We will prove that Z is circularly-symmetric using Proposition 24.3.3 by 
showing that for every deterministic vector a € C™ the random variable a. Z is 
circularly-symmetric . 

To that end, fix some arbitrary a G C™. Since Z is a Gaussian vector, it follows that 
cc T Z is a Gaussian CRV (Proposition 24.3.9 with the substitution of a T for A and 
for B). Moreover, since Z is proper, so is a T Z (Proposition 17.4.2). We have thus 
established that a T Z is a proper Gaussian CRV and hence, by Proposition 24.2.12, 
also circularly-symmetric. □ 

The relationship between circular symmetry, properness, and Gaussianity is illus- 
trated in Figure 24.2. 

We next address the existence of a proper complex Gaussian of a given covariance 
matrix. We first recall that we say that a complex n x n matrix K is complex 
positive semidefinite and write K >^ if a'Ka is a nonnegative real number for 
every ex € C™. Recall also that an n x n complex matrix K is a complex positive 
definite matrix if, and only if, there exists a complex n x n matrix S such that 
K = SS f ; see (Axler, 1997, Chapter 7, Theorem 7.27). 
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Figure 24.2: The relationship between circular symmetry, Gaussianity, and proper- 
ness. The outer region corresponds to all complex random vectors. Within that is 
the set of all vectors whose components are of finite variance. Within it is the family 
of all proper random vectors. The slanted lines indicate the circularly-symmetric 
vectors, and the gray area corresponds to the Gaussian vectors. The same relations 
hold for scalars and for stochastic processes. 



Proposition 24.3.12. 

(i) Given any nx n complex positive semidefinite matrix K, there exists a proper 
complex Gaussian n-vector whose covariance matrix is K. 

(ii) The distribution of a proper Gaussian complex vector is fully specified by its 
covariance matrix. 

(Hi) If Z is a proper complex Gaussian n-vector of nonsingular covariance matrix 
K, then its density is given by: 



/z(z) 



1 



,-z T K" 



TT n det K 



z £ 



(24.27) 



Note 24.3.13. We denote the distribution of a proper Gaussian complex vector of 
covariance matrix K by 

A/- c (0, K) . 

Proof. To prove (i) we note that since K is positive semidefinite, it follows that 
there exists an n x n matrix S such that 



K = SS f . 



(24.28) 
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Consider now the vector 

Z = SW, (24.29) 

where W is a standard complex Gaussian n- vector. We will show that Z has the 
desired properties. First, it must be Gaussian because it is the result of applying 
a deterministic linear mapping to the Gaussian vector W (Proposition 24.3.9). It 
is centered because W is centered (24.15) and because E[SW] = SE[W]. It is 
proper because it is the result of linearly transforming the proper complex random 
vector W (Proposition 17.4.3 and (24.15)). Finally, its covariance matrix is 

E[(SW)(SW) t ] = EfSWW+S 1 ] 
= SE[WW f ] S f 

= si n st 

= K. 

Part (ii) was proved in Corollary 24.3.8. 

To prove (iii) we use (24.29) & (24.28) along with the change of variables formula 
(Lemma 17.4.6) and the density of a standard Gaussian complex random vector 
(24.14) to obtain 

^ = ReTsF^ 5 "^ 

1 = -(S- 1 z)tS- 1 z 



7r n det(SSt) " 

l - e- ztK " ls 

7r™ det \izz 



zeP. □ 



24.4 Exercises 

Exercise 24.1 (The Complex Conjugate of a Circularly-Symmetric CRV). Must the com- 
plex conjugate of a circularly-symmetric CRV be circularly-symmetric? 

Exercise 24.2 (Scaled Circularly-Symmetric CRV). Show that if Z is circularly-symmetric 
and if a 6 C is deterministic, then the distribution of ctZ depends on a only via its 
magnitude |a|. 

Exercise 24.3 (The n-th Power of a Circularly-Symmetric CRV). Show that if Z is a 
circularly-symmetric CRV and if n is a positive integer, then Z n is circularly-symmetric. 

Exercise 24.4 (The Characteristic Function of Circularly-Symmetric CRVs). Show that a 
CRV Z is circularly-symmetric if, and only if, its characteristic function $z(-) is radially- 
symmetric in the sense that $z(tu) depends on w only via its magnitude \w\. 

Exercise 24.5 (Multiplying Independent CRVs). Show that the product of two indepen- 
dent complex random variables is circularly-symmetric whenever (at least) one of them 
is circularly-symmetric. 
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Exercise 24.6 (The Complex Conjugate of a Gaussian CRV). Must the complex conjugate 
of a Gaussian CRV be Gaussian? 

Exercise 24.7 (Independent Components). Show that if the complex random variables 
W and Z are circularly-symmetric and independent, then the random vector (W, Z) is 
circularly-symmetric. 

Exercise 24.8 (The Characteristic Function of a Proper Complex Gaussian Vector). 

Compute the characteristic function of a proper complex Gaussian vector of covariance 
matrix K. 

Exercise 24.9 (Jointly Circularly-Symmetric Complex Gaussians). As in Definition 23.7.1, 
we can also define jointly complex Gaussians and jointly circularly-symmetric complex 
Gaussians. Extend the results of Section 23.7 by showing: 

(i) Two centered jointly complex Gaussian vectors Zi and Z2 are independent if, and 
only if, they satisfy 

E[ZiZ£] =0 and E[ZiZj] =0. 

(ii) Two jointly circularly-symmetric complex Gaussian vectors Zi and Z2 are indepen- 
dent if, and only if, they satisfy 

E[Z!Z 2 ] =0. 

(iii) If Zi, Z2 are centered jointly complex Gaussians, then, conditional on Z2 —2,2, the 
complex random vector Zi is a complex Gaussian such that 

E[(Zi - E[Zi I Z 2 = z 2 ]) (Zi - E[Zi I Z 2 = Z2]) f J Z 2 = z 2 ] 



and 



E[(Zi - E[Zi I Z 2 = z 2 ]) (Zi - E[Zj I Z 2 = z 2 ]) T J Z 2 = z 2 ] 



do not depend on Z2 and such that the conditional mean E[Zi|Z 2 = z 2 ] can be 
expressed as Az 2 + BZ2 for some matrices A and B that do not depend on z 2 . 

(iv) If Zi, Z2 are jointly circularly-symmetric complex Gaussians, then, conditional on 
Z2 = z 2 , the complex random vector Zi is a circularly-symmetric complex Gaussian 
of a covariance matrix that does not depend on z 2 and of a mean that can be 
expressed as AZ2 for some matrix A that does not depend on z 2 . 

Exercise 24.10 (Limits of Complex Gaussians). Extend the definition of almost-sure con- 
vergence (23.71) to complex random vectors, and show that if the complex Gaussian 
d- vectors Zi, Z 2 , . . . converge to Z almost surely, then Z must be a complex Gaussian. 

Exercise 24.11 (Limits of Circularly-Symmetric Complex Random Variables). Consider 
a sequence Z\ , Z^ , . . . of circularly-symmetric complex random variables that converges 
almost surely to the CRV Z. Show that Z must be circularly-symmetric. Extend this 
result to complex random vectors. 

Hint: Consider the characteristic functions of Z, Z\ , Z2 , . . ., and recall the proof of Theo- 
rem 19.9.1. 
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Exercise 24.12 (Limits of Circularly-Symmetric Complex Gaussians). Let Zi,Z 2 ,... be 

a sequence of circularly-symmetric complex Gaussians that converges almost surely to 
the CRV Z . Show that Z must be a circularly-symmetric Gaussian. Extend to complex 
random vectors. 

Hint: Either combine Exercises 24-10 & 24-11 or prove directly using the characteristic 
function as in the proof of Theorem 19.9.1. 



Chapter 25 

Continuous-Time Stochastic Processes 

25.1 Notation 

Recall from Section 12.2 that a continuous-time stochastic process (X(t), (el) 
is a family of random variables that are defined on a common probability space 
(CljJ 7 , P) and that are indexed by the real line (time). We denote by X(t) the 
time-i sample of (X(t), JsR), i.e., the random variable to which t is mapped 
(the RV indexed by i). This RV is sometimes also called the state at time t. 
Rather than writing (X(t), t £ l), we sometimes denote the SP by (X(t)) or 
by X. Perhaps the clearest way to denote the process is as a mapping: 

X:!lxR^I, (u>,t) i-> X(u),t). 

For a fixed £ G R, the time-t sample X(t) is the mapping X(-, i) from f2 to the real 
line, i.e., the RV u> <— > X(u>, i) indexed by t. If we fix w G SI and view X(lu, •) as a 
mapping t t— > X(lu, t), then we obtain a function of time. This function is called a 
trajectory, sample-path, path, sample-function, or realization. 



uj I—* X(lu, i) time-( sample for a fixed (e K (random variable) 
t i— > X(u>, i) trajectory for a fixed u> G tt (function of time) 



Recall also from Section 12.2 that the process is centered if for every JgR the 
RV X(t) is of zero mean. It is of finite variance if for every t G K the RV X(t) 
is of finite variance. 



25.2 The Finite-Dimensional Distributions 

The finite-dimensional distributions (FDDs) of a continuous-time SP is the family 
of all joint distributions of n-tuples of the form (X(ti), . . . , X(t n )), where n can 
be any positive integer and ti, ■ ■ . ,t n G M. are arbitrary epochs. To specify the 
FDDs of a SP [X(t)) one must thus specify for every n G N and for every choice of 

512 
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the epochs t\, . . . , t n 6l the distribution of the n-tuple (X(ti), . . . , X(t n )j. This 
is a conceptually clear if formidable task. We denote the cumulative distribution 
function of the n-tuple (X(ti), . . . , X{t n )) by 

Fnfa, . . . ,S n ;h, . . . ,t n ) ± Pv[X(t!) <£l,...,X(t n ) <£„]. 

We next show that the FDDs of every SP (X(t)) must satisfy two key properties: 
the symmetry property and the consistency property. The symmetry property 
is that F n (-;-) is unaltered when we simultaneously permute its right arguments 
(the t's) and its left arguments (the £'s) by the same permutation. That is, for 
every n s N; every choice of the epochs t\,...,t n € E; every £1, . . . ,£„ G K; and 
every permutation w on {1, . . . , n} 

^n(^7r(l); • • • > €ir(n)\ ^tt(I) > • ■ • j^r(n)) = F n (£i, . . . , £„; ti, . . . , t n ) . (25.1) 

This property is a generalization to n-tuples of the obvious fact that if X and Y are 
random variables, then Pr[X < x, Y < y] = Pr[Y" < y, X < x] for every ijgM. 

The consistency property is that whenever n£N and t\, ■ ■ ■ , t n ,£i, . . . ,£„ G K, 
lim F„ki,... ,6n-i,£«,; ti, . . . , t n _i,t n ) = F n _ 1 (^i, . . . , £ n _i; t\,..., £„-i). 

£ n — >oo 

(25.2) 
This property is a consequence of the fact that the set 

{oj e n : X(u,ti) < &,..., X(u,t n -i) < £ n -i,X(u,t n ) < £„} 

is increasing in £ n and converges as £ n tends to infinity to the set 

{oj e n : X(w, ti) < Ci, • • • , X(u, t n _i) < Cn-i}- 

The key result on the existence of stochastic processes of given FDDs is Kol- 
mogorov's Existence Theorem, which states that the symmetry and consistency 
properties suffice for a family of finite-dimensional distributions to correspond to 
the FDDs of some SP. 

Theorem 25.2.1 (Kolmogorov's Existence Theorem). Let Gi(-; ■), G^-; ■), ... be 
a sequence of functions G n : K™ x K" — > [0, 1] satisfying 

1) that for every n > 1 and every t\, . . . , t n £ R the function G n (-; t\, . . . , t n ) is 
a valid joint distribution function; 1 

2) the symmetry property 

ii, . . . , t n , £i, . . . , £ n € M, it a permutation on {1, . . . , n}; (25.3) 



1 A function F : IR n — > [0, 1] is a valid joint distribution function if there exist random variables 
X\, . . . ,X„ whose joint distribution function is F(-), i.e., 

Pr[Xi <Si,...,x n <s n ] = F(s u ...,£ n ), 6,..-,(»ei. 

Not every function F : IR n — ► [0, 1] is a valid joint distribution function. For example, a valid joint 
distribution function must be monotonic in each variable. See, for example, (Billingsley, 1995, 
Theorem 12.5) for a characterization of joint distribution functions. 
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3) and the consistency property 

lim G„(£i, • • • ,£n-i,£n; t\, . . . ,t n -l, t n ) 

^ n — >oo 

= G„_i(£i, . . . , ? n _i; ii, . . . , in-i)i 

«i,-,t»,&,-,^e». (25.4) 

T/ien i/iere exists a SP (X(t)) whose FDDs are given by {£?„(•;•)} in the sense 
that 

Pr[X(ti) <&,..., X(t„) <£n] =G n fa,...,t n ;tl,...,t n ) 

for every «6N, aW t\, ■ ■ ■ , t n € M., and all £i , . . . , £ n € K. 

Proof. See, for example, (Billingsley, 1995, Chapter 7, Section 36), (Cramer and 
Leadbetter, 2004, Section 3.3), (Grimmett and Stirzaker, 2001, Section 8.6), or 
(Doob, 1990, Chapter I § 5). □ 

In the study of n-tuples of random variables we can use the joint distribution 
function to answer, at least in principle, most of our probability questions. When it 
comes to stochastic processes, however, there are interesting questions that cannot 
be answered using the FDDs. For example, it can be shown that the probability 
of the event that the SP (X(t)) produces a sample-path that is continuous at time 
zero cannot be computed from the FDDs. This is not due to our limited analytic 
capabilities but rather because there exist two stochastic processes of identical 
FDDs where for one process this event is of zero probability whereas for the other 
it is of probability one (Cramer and Leadbetter, 2004, Section 3.6). Fortunately, 
most of the questions of interest to us in Digital Communications can be answered 
based on the FDDs. 

An exception is a very subtle point related to measurability. From the FDDs alone 
one cannot determine whether the trajectories are measurable functions of time, 
i.e., whether it makes sense to talk about integrals of the form J_ x(u>, t) dt. This 
issue will be revisited in Section 25.9. 

The above discussion motivates us to define the set of events whose probability 
can be determined from the FDDs using the axioms of probability, i.e., using the 
rules that the probability of the set of all possible outcomes £7 is one and that 
the probability of a countable union of disjoint events is the infinite sum of the 
probabilities of the events. In the mathematical literature what we are defining is 
called the cr-algebra generated by (X(t), i € R) or the cr-algebra generated 
by the cylindrical sets of (X(t), (6 R). 2 For the classical definition see, for 
example, (Billingsley, 1995, Section 36). 

Definition 25.2.2 (er-Algebra Generated by a SP). The cr-algebra generated 

by a SP yX(t), t G M.) which is defined over the probability space (0,.F, P) is 
the set of events (i.e., elements of T) whose probability can be computed from the 
FDDs of (X(t)) using only the axioms of probability. 



2 It is the smallest cr-algebra with respect to which all the random variables (X(t), (£l) are 
measurable. 



25.3 Definition of a Gaussian SP 515 

We now rephrase our previous statement about continuity as saying that the set 
of uj £ f2 for which the function t t— > X(u), i) is continuous at t = is not in the 
CT-algebra generated by (X(t)). The probability of such sets cannot be inferred 
from the FDDs alone. If such sets are assigned a probability it must be based on 
some additional information that is not captured by the FDDs. 

The FDDs provide a natural way to define independence between stochastic pro- 
cesses. 

Definition 25.2.3 (Independent Stochastic Processes). Two stochastic processes 
(X(t)) and (Y(t)) defined on the same probability space (f2,^F, P) are said to be 
independent stochastic processes if for every n £ N and any choice of the 
epochs t\, ... ,t n £ K, the n-tuples (X (ti) , . . . , X (t n )) and (Y(ti), . . . ,Y(t n )) are 
independent. 



25.3 Definition of a Gaussian SP 

By far the most important processes for modeling noise in Digital Communications 
are the Gaussian processes. Fortunately these processes are among the mathemat- 
ically most tractable. The definition of a Gaussian SP builds on that of a Gaussian 
vector (Definition 23.1.1). 

Definition 25.3.1 (Gaussian Stochastic Processes). A SP (X(t)) is said to be a 
Gaussian stochastic process if for every n £ N and every choice of the epochs 
t\, . . . , t n £ WL, the random vector (X(ti), . . . ,X(t n j) J is Gaussian. 

Note 25.3.2. Gaussian stochastic processes are of finite variance. 

Proof. If (X(i)j is a Gaussian process, then a fortiori at each epoch t £ R, the 
random variable X(t) is a univariate Gaussian (choose n = 1 in the above defini- 
tion) and hence, by the definition of the univariate distribution (Definition 19.3.1), 
of finite variance. □ 

One of the things that make Gaussian processes tractable is the ease with which 
their FDDs can be specified. 

Proposition 25.3.3 (The FDDs of a Gaussian SP). If(X(t)) is a centered Gaus- 
sian SP, then all its FDDs are determined by the mapping that specifies the covari- 
ance between any two of its samples: 

(i 1 ,i 2 )^Cov[X(i 1 ),X(t 2 )], h,t 2 £R. (25.5) 

Proof. Let (X(t)) be a centered Gaussian SP. We shall show that for any choice of 
the epochs t\, . . . , t n £ M we can compute the joint distribution of X(ti), . . . X(t n ) 
from the mapping (25.5). To this end we note that since (X(t)) is a Gaussian 
SP, the random vector (X(ti), . . .X(t n j) J is Gaussian (Definition 25.3.1). Conse- 
quently, its distribution is fully specified by its mean vector and covariance matrix 
(Theorem 23.6.7). Its mean vector is zero, because we assumed that (X(t)) is cen- 
tered. To conclude the proof we thus only need to show that the covariance matrix 
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of (X(ti), . . . X(t n j) T is determined by the mapping (25.5). But this is obvious 
because the covariance matrix of (X(t\), . . .X(t n j) T is the n x n matrix 



/Cov[X(ti), X(h)] Cw[X(ti),X(t 2 )] 



\Cw[X(t n ),X(ti)] Cw[X(t n ),X(t2)] 



Cov[X(ti),X(t„)]\ 



Cw[X(t n ),X(t n )]J 



(25.6) 



and each of the entries in this matrix is specified by the mapping (25.5). □ 

Things become even simpler if the Gaussian process is wide-sense stationary 
(Definition 25.4.2 ahead). In this case the RHS of (25.5) is determined by t\ — t 2 , 
so the mapping (25.5) (and hence all the FDDs) is determined by the mapping 
t i— > Co\/[X (t) , X (t + t)]. But before discussing wide-sense stationary Gaussian 
stochastic processes in Section 25.5, we first define stationarity and wide-sense 
stationarity for general processes that are not necessarily Gaussian. 



25.4 Stationary Continuous-Time Processes 

Our treatment of stationary continuous-time processes is similar to the treatment 
of their discrete-time counterparts (Chapter 13). The following is the continuous- 
time analogue of Definition 13.2.1. 

Definition 25.4.1 (Stationary Continuous-Time SP). We say that a continuous- 
time SP (X(t)) is stationary (or strict sense stationary, or strongly sta- 
tionary) if for every «6N, any epochs ti, . ■ ■ , t n Gl, and every t£|, 

(X(h + r), . . . , X(t n + r)) = (X(h), ..., X(t n )) . (25.7) 

By considering the case where n = 1 we obtain that if (X(t)) is stationary, then 
all its samples have the same distribution 

X(t) = X(t + r), t,T€R. (25.8) 

That is, the distribution of the random variable X(t) does not depend on t. By 
considering n = 2 we obtain that if yX(t)J is stationary, then the joint distribution 
of any two of its samples depends on how far apart they are and not on the absolute 
time at which they are taken 

{X(tl),X(t2)) = (X(tl+T),X(t 2 + T)), tl,t 2 ,T€R. (25.9) 

That is, the joint distribution of (X(ti),X(t2)) can be computed from t 2 — t\. 

As we did for discrete-time processes (Definition 13.3.1), we can also define wide- 
sense stationarity of continuous-time processes. Recall that a process (X(t)) is 
said to be of finite variance if at every time (gl the random variable X(t) is of 
finite variance. 
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Definition 25.4.2 (Wide-Sense Stationary Continuous-Time SP). A continuous- 
time SP [X(t)j is said to be wide-sense stationary (or weakly stationary or 
second-order stationary) if the following three conditions are met: 

1) It is of finite variance. 

2) Its mean is constant 

E[X(t)] = E[X(t + t)] , t,r€R. (25.10) 

3) The covariance between its samples satisfies 

Cw[X(tl),X(t2)]=Cw[X(tl+T),X(t2 + T)], ti,t 2 ,T€R. (25.11) 

By considering the case where t\ = ti in (25.11), we obtain that all the samples of 
a WSS SP have the same variance: 

Var[X(i)] = Var[X(0)], teR. (25.12) 

Note 25.4.3. Every finite-variance stationary SP is WSS. 

Proof. This follows because (25.8) implies (25.10), and because (25.9) implies 

(25.11). □ 

The reverse is not true: some WSS processes are not stationary. (Wide-sense 
stationarity concerns only means and covariances, whereas stationarity has to do 
with distributions.) 

The following definition of the autocovariance function of a continuous-time WSS 
SP is the analogue of Definition 13.5.1. 

Definition 25.4.4 (Autocovariance Function). The autocovariance function 

Kxx '■ R —> R of a WSS continuous-time SP (X(t)) is defined for every r G K by 

Kxx(T)±Cw[X(t + T),X(t)], (25.13) 

where the RHS does not depend on t because (X(t)) is assumed to be WSS. 

By evaluating (25.13) at r = and using (25.12), we can express the variance 
of X{t) in terms of the autocovariance function Kxx as 

Var[X(t)]=Kxx(0), teR. (25.14) 

We end this section with a few simple inequalities related to WSS stochastic pro- 
cesses and their autocovariance functions. 

Lemma 25.4.5. Let (X(t)) be a WSS SP of autocovariance function Kxx- Then 

\Kxx(t)\ < Kxx(0), reK, (25.15) 



E[|X(t)|] < yjK XX (0) + E[X(0)r, t e R, (25.16) 

and 

E[\x(t)X(t')\] <K XX (0) + E[X(0)] 2 , t,t' eR. (25.17) 
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Proof. Inequality (25.15) follows from the Covariance Inequality (Corollary 3.5.2): 
\K xx (t)\ = \Cov[X (t + T),X(t)}\ 



< y/Var[X{t + T)]y/Var[X{t)] 
= Kjfjc(0), 

where the last equality follows from (25.14). 

Inequality (25.16) follows from the nonnegativity of the variance of |-X"(i)| and the 
assumption that (X(t)) is WSS: 



2 

2 /-n,r/,Mi\2 



< Var[|X(£)|] 

= E[X 2 (t)}-(E\\X(t)\]) 

= Var[X(i)] + (E[X(t)}) 2 - (E[\X(t)\\) 

= K xx (0)+(E[X(0)}) 2 -(E[\X(t)\}) 2 . 

Finally Inequality (25.17) follows from the Cauchy-Schwarz Inequality for random 
variables (Theorem 3.5.1) 

\E[UV]\ < ^E[U 2 }E[V 2 } 

by substituting |-X"(i)| for U and |-X"(£')| for V and by noting that 

E[\X(t)\ 2 }=E[X*(t)} 

= Va r [X(i)] + (ELY(i)]) 2 

= k xx (o) + (e[x(o)}) 2 , ten. a 

25.5 Stationary Gaussian Stochastic Processes 

For Gaussian stochastic processes we do not distinguish between stationarity and 
wide-sense stationarity. The reason is that, while for general processes the two 
concepts are different (in that every finite- variance stationary SP is WSS, but not 
every WSS SP is stationary), for Gaussian stochastic processes the two concepts are 
equivalent. These relationships between stationarity and wide-sense stationarity for 
general stochastic processes and for Gaussian stochastic processes are illustrated 
in Figure 25.1. 

Proposition 25.5.1 (Stationary Gaussian Stochastic Processes). 

(i) A Gaussian SP is stationary if, and only if, it is WSS. 

(ii) The FDDs of a centered stationary Gaussian SP are fully specified by its 
autocovariance function. 

Proof. We begin by proving (i). One direction has only little to do with Gaus- 
sianity. Since every Gaussian SP is of finite variance (Note 25.3.2), and since every 
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Figure 25.1: The relationship between wide-sense stationarity, Gaussianity, and 
strict-sense stationarity. The outer region corresponds to all stochastic processes. 
Within it is the set of all finite- variance processes and within that the set of all wide- 
sense stationary processes. The slanted lines indicate the strict-sense stationary 
processes, and the gray area corresponds to the Gaussian stochastic processes. 



finite- variance stationary SP is WSS (Note 25.4.3), it follows that every stationary 
Gaussian SP is WSS. 

Gaussianity plays a much more important role in the proof of the reverse direction, 
namely, that every WSS Gaussian SP is stationary. We prove this by showing that 
if (X(£)) is Gaussian and WSS, then for every n G N and any t\, . . . ,t n ,T G K 
the joint distribution of X(t\) 1 . . . , X(t n ) is identical to the joint distribution of 

' be fixed. 

,X(t n + T)) T 



i £ni T ^ 



X(ti + t), . . -,X(t n + t). To this end, let n e N and ti, . . . 

Because [X[ty\ is Gaussian, (X(ti), . . . , X(t n )) T and (X(t\ 

are both Gaussian vectors (Definition 25.3.1). And since (X(t)) is WSS, the two 

are of the same mean vector (see (25.10)). The former's covariance matrix is 



/Cw[X(ti),X(ti)] 



\Cov[X(t n ),X{U 



Cov[X(t 1 ),X(i„)]\ 



Cw[X(t n ),X(t n )]J 
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and the latter's is 

/Cov[X(ti+T),X(ti + r) 



yCwlXfa + T^Xiti + T)] 



Cov^ti + Tj.X^+rJjN 



Cov[X(t n + r),X(t n + r)]y 



Since (X(t)j is WSS, the two covariance matrices are identical (see (25.11)). But 
two Gaussian vectors of equal mean vectors and of equal covariance matrices have 
identical distributions (Theorem 23.6.7), so the distribution of (X(t\), . . . , X(t n )) T 
is identical to that of (X(ti+r), . . . , X(t n + TJ) J . Since this has been established for 
all choices of n € N and all choices of t\, . . . , t n , T € R, the SP (X(t)) is stationary. 

Part (ii) follows from Proposition 25.3.3 and the definition of wide-sense stationar- 
ity. Indeed, by Proposition 25.3.3, all the FDDs of a centered Gaussian SP (X(£)) 
are determined by the mapping (25.5). If (X(t)) is additionally WSS, then the 
RHS of (25.5) can be computed from t\ — ti and is given by Kxx{t\ — £2), so the 
mapping (25.5) is fully specified by the autocovariance function Kxx ■ O 



25.6 Properties of the Autocovariance Function 

Many of the definitions and results on continuous-time WSS stochastic processes 
have analogous discrete-time counterparts. But some technical issues are encoun- 
tered only in continuous time. For example, most results on continuous-time WSS 
stochastic processes require that the autocovariance function of the process be 
continuous at the origin, i.e., satisfy 



lim K X x (S) 

8 — »0 



Kxx(0), 



(25.18) 



and this condition has no discrete-time counterpart. As we next show, this condi- 
tion is equivalent to the condition 



lim E \(X(t + 5)- X{t)f] =0, ief. 
This equivalence follows from the identity 

E[(X(t)-X(t + 6)) 2 ] =2(K XX (0)-Kxx(5)), t,Se 



(25.19) 



(25.20) 



which can be proved as follows. We first note that it suffices to prove it for centered 
processes, and for such processes we then compute: 

E [(X{t) - X{t + <5)) 2 ] = E [X 2 {t) - 2X{t) X(t + S) + X 2 {t + 5)} 

= E [X 2 (t)} - 2E[X(t) X(t + 5)} + E [X 2 (t + S)} 
= Kxx(0)-2Kxx(5) + K X x(0) 
= 2(K XX (0)-K XX (S)), 
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where the first equality follows by opening the square; the second by the linearity of 
expectation; the third by the definition of K xx ; and the final equality by collecting 
terms. 

We note here that if the autocovariance function of a WSS process is continuous 
at the origin, then it is continuous everywhere. In fact, it is uniformly continuous: 

Lemma 25.6.1. // the autocovariance function of a WSS continuous-time SP is 
continuous at the origin, then it is a uniformly continuous function. 

Proof. We first note that it suffices to prove the lemma for centered processes. 
Let (X(t)) be such a process. For every r, d £ K we then have 

\K xx (t + S) - K xx (t)\ = \E[X(t + S) X(0)} - E[X(t) X(0)}\ 
= \e[(X(t + 5)-X(t))x(0)]\ 
= \Cov[X{t + 5) - X{t),X{0)}\ 

< ] Je[(X(t + S)-X(t)) 2 ]^E{X^0)} 

= ^2(K XX (Q) - K XX {S)) ^K xx (0) 

= yj2K xx (0)(K xx (0)-K xx (5)), (25.21) 

where the equality in the first line follows from the definition of the autocovariance 
function because (X(t)) is centered; the equality in the second line by the linearity 
of expectation; the equality in the third line by the definition of the covariance 
between two zero-mean random variables; the inequality in the fourth line by the 
Covariance Inequality (Corollary 3.5.2); the equality in the fifth line by (25.20); 
and the final equality by trivial algebra. The uniform continuity of K xx now 
follows from (25.21) by noting that its RHS does not depend on r and that, by our 
assumption about the continuity of K xx at zero, it tends to zero as 5 — > 0. □ 

We next derive two important properties of autocovariance functions and then 
demonstrate in Theorem 25.6.2 that these properties characterize those functions 
that can arise as the autocovariance functions of a WSS SP. These properties are 
the continuous-time analogues of (13.12) & (13.13), and the proofs are almost 
identical. We first state the properties and then proceed to prove them. 

The first property is that the autocovariance function K xx of any continuous-time 
WSS process (X(t)) is a symmetric function 

Kxx(-r) = Kxx(t), rel. (25.22) 

The second is that it is a positive definite function in the sense that for every 
n £ N, and for every choice of the coefficients ai, . . . ,a n £ M. and of the epochs 
t\, . . . , t n 6M 

n n 

^^a„a v :K xx {t v -t v >)>Q. (25.23) 
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To prove (25.22) we calculate 

K xx (t) = Cov[X (t + r),X{t)] 
= Cov[X{t'),X{t' -t)} 
= Cov[X(t' -r),X(t')] 
= Kxx-(-r), tgI, 

where the first equality follows from the definition of Kxx{ T ) (25.13); the second 
by defining t' = t + r; the third because Cov[X, Y] = Cov[Y, X] (for real random 
variables); and the final equality by the definition of Kxx(— t) (25.13). 

To prove (25.23) we compute 

n n n n 

^ ^ a v a v , K X x(t v - t v i) = ^ ^ a„avCovLY (£„), X (£„/)] 



(25.24) 



!/=l 


/'=i 






- n n - 


= Cov 


y^a J/ X(^), y^ a„'X(t„/) 


L l/=1 L/' = l J 




- n 




= Var 


} j a v X(t v ) 

■ v=\ 




> o. 







The next theorem demonstrates that Properties (25.22) and (25.23) characterize 
the autocovariance functions of WSS stochastic processes (c/. Theorem 13.5.2). 

Theorem 25.6.2. Every symmetric 'positive definite function is the autocovariance 
function of some stationary Gaussian SP. 

Proof. The proof is based on Kolmogorov's Existence Theorem (Theorem 25.2.1) 
and is only sketched here. Let K(-) be a symmetric and positive definite function 
from R to K. The idea is to consider for every n G N and for every choice of the 
epochs ii, . . . , t n 6R the joint distribution function G„(-; ti, ■ ■ ■ , t n ) corresponding 
to the centered multivariate Gaussian distribution of covariance matrix 



/K(ti-ti) K(ti-t 2 ) 



\K(t n -ti) K{t n -t 2 ) 



K(ti-t„)\ 



K(t„ - t„)J 



and to verify that the sequence {G„(-; •)} satisfies the symmetry and consistency 
requirements of Kolmogorov's Existence Theorem. The details, which can be found 
in (Doob, 1990, Chapter II, Section § 3, Theorem 3.1), are omitted. □ 



25.7 The Power Spectral Density of a Continuous-Time SP 



Under suitable conditions, engineers usually define the power spectral density of a 
WSS SP as the Fourier Transform of its autocovariance function. There is nothing 
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wrong with this definition, and we encourage the reader to think about the PSD 
in this way. 3 We, however, prefer a slightly more general definition that allows us 
also to consider discontinuous spectra and, more importantly, allows us to infer 
that any integrable, nonnegative, symmetric function is the PSD of some Gaus- 
sian SP (Proposition 25.7.3). Fortunately, the two definitions agree whenever the 
autocovariance function is continuous and integrable. 

Before defining the PSD, we pause to discuss the Fourier Transform of the auto- 
covariance. If the autocovariance function Kxx of a WSS SP (X(t)) is integrable, 
i.e., if 

/oo 
|Kxx(r)|dr<oo, (25.25) 

-oo 

then we can discuss its FT Kxx- The following proposition summarizes the main 
properties of the FT of continuous integrable autocovariance functions. 

Proposition 25.7.1. If the autocovariance function Kxx is continuous at the origin 
and integrable, then its Fourier Transform Kxx is nonnegative 

Kxx(f) > 0, /el (25.26) 

and symmetric 

K X x(-f) = Kxx(f), f€R. (25.27) 

Moreover, the Inverse Fourier Transform recovers Kxx in the sense that 

Kxx(t) = f Kxx(f) e a * fT d/, rel. (25.28) 

J — oo 

Proof. This result can be deduced from three results in (Feller, 1971, Chap- 
ter XIX): the theorem in Section 3, Bochner's Theorem in Section 2, and Lemma 2 
in Section 2. □ 

Definition 25.7.2 (The PSD of a Continuous-Time WSS SP). We say that the 

WSS continuous-time SP (X(t)) is of power spectral density (PSD) Sxx if $xx 
is a nonnegative, symmetric, integrable function from M. to M. whose Inverse Fourier 
Transform is the autocovariance function Kxx of (X(t)): 

/oo 
Sxx(/)e i27r/r d/, tgI. (25.29) 

-oo 

A few remarks regarding this definition: 



3 Engineers can, however, be a bit sloppy in that they sometimes speak of a SP whose PSD 
is discontinuous, e.g., the Brickwall function / i— > I{|/| < W}. This is inconsistent with their 
definition because the FT of an integrable function must be continuous (Theorem 6.2.11), and 
consequently if the autocovariance function is integrable then its FT cannot be discontinuous. 
Our more general definition does not suffer from this problem and allows for discontinuous PSDs. 

4 Recall that without additional assumptions one is not guaranteed that the Inverse Fourier 
Transform of the Fourier Transform of a function will be identical to the original function. Here we 
need not make any additional assumptions because we already assumed that the autocovariance 
function is continuous and because autocovariance functions are positive definite. 
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(i) By the uniqueness of the IFT (the analogue of Theorem 6.2.12 for the IFT) it 
follows that if two functions are PSDs of the same WSS SP, then they must be 
equal except on a set of frequencies of Lebesgue measure zero. Consequently, 
we shall often speak of "the" PSD as though it were unique. 

(ii) By Proposition 25.7.1, if Kxx is continuous and integrable, then (X(t)) has 
a PSD in the sense of Definition 25.7.2, and this PSD is the FT of K X x- 
There are, however, autocovariance functions that are not integrable and 
that nonetheless have a PSD in the sense of Definition 25.7.2. For example, 
r i— > sinc(r). 

Thus, every continuous autocovariance function that has a PSD in the en- 
gineers' sense (i.e., that is integrable) also has the same PSD according to 
our definition, but our definition is more general in that some autocovariance 
functions that have a PSD according to our definition are not integrable and 
therefore do not have a PSD in the engineers' sense. 

(iii) By substituting r = in (25.29) and using (25.14) we can express the variance 
of X(t) in terms of the PSD Sxx as 

/DC 
Sxx(f)df, iet. (25.30) 

-oo 

(iv) Only processes with continuous autocovariance functions have PSDs, because 
the RHS of (25.29), being the IFT of an integrable function, must be contin- 
uous (Theorem 6.2.11 (ii)). 

(v) It can be shown that if the autocovariance function can be written as the 
IFT of some integrable function, then this latter function must be nonneg- 
ative (except on a set of frequencies of Lebesgue measure zero). This is the 
continuous-time analogue of Proposition 13.6.3. 

The nonnegativity, symmetry, and integrability conditions characterize PSDs in 
the following sense: 

Proposition 25.7.3. Every nonnegative, symmetric, integrable function is the PSD 
of some stationary Gaussian SP whose autocovariance function is continuous. 

Proof. Let S(-) be some integrable, nonnegative, and symmetric function from M. 
to the nonnegative reals. Define K(-) to be its IFT 

/oo 
S(/)e i2 ^d/, tgR. (25.31) 

-oo 

We shall verify that K(-) satisfies the hypotheses of Theorem 25.6.2, namely, that 
it is symmetric and positive definite. It will then follow from Theorem 25.6.2 that 
there exists a stationary Gaussian SP (X(t)j whose autocovariance function Kxx 
is equal to K(-) and is thus given by 

/oo 
S(f)e' 2 ^ T df, TGI. (25.32) 
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This will establish that (X(t)) is of PSD S(-). The continuity of Kxx will follow 
from the continuity of the IFT of integrable functions (Theorem 6.2.11). 

To conclude the proof we need to show that the function K(-) defined in (25.31) 
is symmetric and positive definite. The symmetry follows from our assumption 
that S(-) is symmetric: 

f'OO 

K(-r)= / S(/)e i2 ^(- T )d/ 
S(-/~) e ^/V d/ ~ 

— oo 

CO 

S(/) e i2 ^ V d/ 



= K(r), tgR, 

where the first equality follows from (25.31); the second from the change of variable 
/ = — /; the third by the symmetry of S(-); and the final equality again by (25.31). 

We next prove that K(-) is positive definite. To that end we fix some n£N, some 
constants <x\,...,ot n £l, and some epochs t\, ■ ■ ■ , t n € R and compute: 

n n n n ftOQ 



-co 
n n 



/co / n n \ 

S(/) EE^' ei2 *" v) W 

/co / n n 

Wee 

/oo / «_ \ / '<> 

/oo _" 
S(/) $>e i2 ^ 
-CO i 



a, e i2T/t - a„ e - i2 ^'"' j d/ 



2 

d/ 



>0, 



where the first equality follows from (25.31); the subsequent equalities by simple 
algebra; and the last inequality from our assumption that S(-) is nonnegative. □ 



25.8 The Spectral Distribution Function 

In this section we shall state without proof Bochner's Theorem on continuous 
positive definite functions and discuss its application to continuous autocovariance 
functions. We shall then define the spectral distribution function of WSS stochastic 
processes. The concept of a spectral distribution function is more general than 
that of a PSD, because every WSS with a continuous autocovariance function has 
a spectral distribution function, but only some have a PSD. Nevertheless, for our 
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purposes, the notion of PSD will suffice, and the results of this section will not be 
used in the rest of the book. 

Recall that the characteristic function 5>x(") of a RV X is the mapping from R 
to C defined by 

w i-> E [e iroX ] , roeR. (25.33) 

If X is symmetric (i.e., has a symmetric distribution) in the sense that 

Yv\X >x} = Yv[X < -x], x e R, (25.34) 

then <&x (•) only takes on real values and is a symmetric function, as the following 
argument shows. The symmetry of the distribution of X implies that X and —X 
have the same distribution, which implies that their exponentiations have the same 
law 

e iroX = e~' mX , weR, (25.35) 

and a fortiori that the expectation of the two exponentials are equal 

E [e mX ] = E [e~' mX ] , w eR. (25.36) 

The LHS of (25.36) is $x(ro), and the RHS is &x{— £>?), thus demonstrating the 
symmetry of $x(")- To establish that (25.34) also implies that 3>x(") ls real, we 
note that, by (25.36), 

$iH = E[e iroX ] 



i(E[e^] + E[e- 
2 

= E[C0S(C7X)] , ?37 e K, 

which is real. Here the first equality follows from (25.33); the second from (25.36); 
and the third from the linearity of expectation. 

Bochner's Theorem establishes a correspondence between continuous, symmetric, 
positive definite functions and characteristic functions. 

Theorem 25.8.1 (Bochner's Theorem). Let the mapping <&(•) from R to R be 
continuous. Then the following two conditions are equivalent: 

a) $(•) is the characteristic function of some RV having a symmetric distribu- 
tion. 

b) $(•) is a symmetric positive definite function satisfying < 1 ) (0) = 1. 

Proof. See (Feller, 1971, Chapter XIX, Section 2) or (Loeve, 1963, Chapter IV, 
Section 14) or (Katznelson, 1976, Chapter VI, Section 2.8). □ 

Bochner's Theorem is the key to understanding autocovariance functions: 
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Proposition 25.8.2. Let (X(t)) be a WSS SP whose autocovariance function Kxx 
is continuous. Then: 

(i) There exists a symmetric RV S such that 

Kxx(t) = K x;f (0) E [e' 2 ^ s ] , rel. (25.37) 

(ii) If Kxx (0) > 0, then the distribution of S in (25.37) is uniquely determined 
by Kxx, and (X(t)) has a PSD if, and only if S has a density. 

Proof. If Kxx(0) = 0, then (X(t)j is deterministic in the sense that for every 
epoch t € 1 the variance of X(t) is zero. By the inequality |Kxx(t")| < Kxx(0) 
(Lemma 25.4.5, (25.15)) it follows that if K X x(0) = then Kxx(r) = for all 
r£l, and (25.37) holds in this case for any choice of S and there is nothing else 
to prove. 

Consider now the case Kxx(0) > 0. To prove Part (i) we note that because Kxx is 
by assumption continuous, and because all autocovariance functions are symmetric 
and positive definite (see (25.22) and (25.23)), it follows that the mapping 

Kxx(0)' 

is a continuous, symmetric, positive definite mapping that takes on the value one 
at t = 0. Consequently, by Bochner's Theorem, there exists a RV R of a symmetric 
distribution such that 

^$ -■*"■]■ — 

It follows that if we define S as R/(2w) then (25.37) will hold, and Part (i) is thus 
also established for the case where Kxx(0) > 0. 

We now conclude the treatment of the case Kxx(0) > by proving Part (ii) for 
this case. That the distribution of S is unique follows because (25.37) implies that 

rriw S1 _ Mg/l>)) 

E[e J " Kxx(0) ' roGK ' 

so Kxx determines the characteristic function of S and hence also its distribution 
(Theorem 17.4.4). 

Because the distribution of S is symmetric, if S has a density then it also has a 
symmetric density. Denote by fs(-) a symmetric density function for S. In terms 
of fs(-) we can rewrite (25.37) as 

/oo 
Kxx(0)/ s (s)e i2TST d S , rel, 
-DC 

so the nonnegative symmetric function Kxx(0) fs(') ls a PSD of (X(t)). Con- 
versely, if (X(t)) has PSD Sxx, then 

/CO 
Sxx(f)e' 27TfT df, rel, (25.38) 

-co 
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and (25.37) holds with S having the density 

<^XX (UJ 

(The RHS of (25.39) is symmetric, nonnegative, and integrates to 1 by (25.30).) □ 

Proposition 25.8.2 motivates us to define the spectral distribution function of a 
continuous autocovariance function (or of a WSS SP having such an autocovariance 
function) as follows. 

Definition 25.8.3 (Spectral Distribution Function). The spectral distribution 

function of a continuous autocovariance function Kxx is the mapping 

£^K xx (0)Pr[S*<£], (25.40) 

where S is a random variable for which (25.37) holds. 



25.9 The Average Power 

We next address the average power in the sample-paths of a SP. We would like to 
better understand formal expressions of the form 

i r T/2 

- / X 2 (us,t)dt 

> J-T/2 

for a SP (X(t)) defined on the probability space (CI, T , P). Recalling that if we fix 
us £ CI then we can view the trajectory t <— > X(us,t) as a function of time, we would 
like to think about the integral above as the time-integral of the square of the 
trajectory t t— > X(u>,t). Since the result of this integral is a (nonnegative) number 
that depends on w, we would like to view this result as a nonnegative RV 



i r J/2 

- / x 2 (cu,t)dt, uj£n. 

TJ-T/2 



Mathematicians, however, would object to our naive approach on two grounds. The 
first is that it is prima facie unclear whether for every fixed us £ CI the mapping 
t i— » X 2 (lu, t) is sufficiently well-behaved to allow us to discuss its integral. (It may 
not be Lebesgue measurable.) The second is that, even if this integral could be 
carried out for every us £ CI, it is prima facie unclear that the result would be a 
RV. While it would certainly be a mapping from CI to the extended reals (allowing 
for +oo), it is not clear that it would satisfy the technical measurability conditions 
that random variables must meet. 5 



5 By "X is a random variable possibly taking on the value +oo" we mean that X is a mapping 
from f2 to K U {+00} with the set {u> S fl : X(u>) < £} being an event for every £ £ K and with 
the set {iaj £ Q : X(u>) = +00} also being an event. 
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To address these objections we shall assume that (X(t)) is a "measurable stochastic 
process." This is a technical condition that will be foreign to most readers and 
that will be inessential to the rest of this book. We mention it here because, in 
order to be mathematically honest, we shall have to slip this attribute into some 
of the theorems that we shall later state. Nothing will be lost on readers who 
replace "measurable stochastic process" with "stochastic process satisfying a mild 
technical condition." 

Fortunately, this technical condition is, indeed, very mild. For example, Propo- 
sition 25.7.3 still holds if we slip in the attribute "measurable" before the words 
"Gaussian process." Similarly, in Theorem 25.6.2, if we add the hypothesis that 
the given function is continuous at the origin, then we can slip in the attribute 
"measurable" before the words "stationary Gaussian stochastic process." 6 

For the benefit of readers who are familiar with Measure Theory, we provide the 
following definition. 

Definition 25.9.1 (Measurable SP). Let (X(t), t £ K) be a SP defined over the 
probability space (Cl,T,P). We say that the process is a measurable stochastic 
process if the mapping (ui, t) i— > X(u>, t) is a measurable mapping from CI x M. to R 
when the range M. is endowed with the Borel a-algebra on M. and when the domain 
CI x M. is endowed with the a-algebra defined by the product of T on CI by the Borel 
a-algebra on K. 

The nice thing about measurable stochastic processes is that if (X(t)) is a measur- 
able SP, then for every u> € CI the trajectory t i— > X(u, t) is a Borel (and hence also 
Lebesgue) measurable function of time; see (Halmos, 1950, Chapter 7, Section 34, 
Theorem B) or (Billingsley, 1995, Chapter 3, Section 18, Theorem 18.1 (ii)). More- 
over, for such processes we can sometimes use Fubini's Theorem to swap the order 
in which we compute time-integrals and expectations; see (Halmos, 1950, Chap- 
ter 7, Section 36) or (Billingsley, 1995, Chapter 3, Section 18, Theorem 18.3 (ii)). 

We can now state the main result of this section regarding the average power in a 
WSS SP. 

Proposition 25.9.2 (Power in a Centered WSS SP). // (X(t)) is a measurable, 
centered, WSS SP defined over the probability space (CI, T , P) and having the au- 
tocovariance function Kxx, then for every a, b G R satisfying a < b the mapping 



1 rb 



b — a 

defines a RV (possibly taking on the value +ooj satisfying 

1 



X\w,t)dt (25.41) 



b — a 



b 

2 



X 2 (t)dt 



Kjfx(0). (25.42) 



6 These are but very special cases of a much more general result that states that given FDDs 
corresponding to a WSS SP of an autocovariance that is continuous at the origin, there exists 
a SP of the given FDDs that is also measurable. See, for example, (Doob, 1990, Chapter II, 
Section § 2, Theorem 2.6). (Replacing the values ±oo with zero may ruin the separability but 
not the measurability.) 
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Proof. The proof of (25.42) is straightforward and merely requires swapping the 
order of integration and expectation. This swap can be justified using Fubini's The- 
orem. Heuristically, the swapping of expectation and integration can be justified by 
thinking about the integral as being a Riemann integral that can be approximated 
by finite sums and by then recalling the linearity of expectation that guarantees 
that the expectation of a finite sum is the sum of the expectations. We then have 



b 

X 2 {t)dt 


= f E[X 2 (t)} dt 

J a 




= f K XX (0)di 

J a 




= (b-a)K xx (0) 



where the first equality follows by swapping the integration with the expectation; 
the second because our assumption that (X(t)) is centered implies that for every 
t€l the RV X(t) is centered and by (25.13); and the final equality because the 
integrand is constant. 

That (25.41) is a RV (possibly taking on the value +00) follows from Fubini's 
Theorem. □ 

Recalling Definition 14.6.1 of the power in a SP as 

1 r T /2 



lim E 

T— »oo 



! -' -T/2 



X 2 (t)dt 



we conclude: 



Corollary 25.9.3. The power in a centered, measurable, WSS SP (X(t)) of auto- 
covariance function Kxx is equal to Kxx(0). 



25.10 Linear Functionals 

For the problem of detecting continuous-time signals corrupted by noise, we shall 
be interested in stochastic integrals of the form 



X{t) s(t) dt (25.43) 

for WSS stochastic processes (X(t)) defined over a probability space (0,.F, P) 
and for properly well-behaved deterministic functions s(-). We would like to think 
about the result of such an integral as defining a RV 

X(u,t)s(t)dt (25.44) 

that maps each lu G O to the real number that is the result of the integration 
over time of the product of the trajectory t t— > X(uj,t) corresponding to lu by the 
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deterministic function t t— > s(t). That is, each lu is mapped to the inner product 
between its trajectory t <— > X(u),t) and the function s(-). 

This is an excellent way of thinking about such integrals, but we do run into 
some mathematical objections similar to those we encountered in Section 25.9. 
For example, it is not obvious that for each u> € O the mapping t t— » X(uj,t) s(t) 
is a sufficiently well-behaved function for the time-integral to be defined. As we 
shall see, for this reason we must impose certain restrictions on s(-), and we will 
not claim that t <—* X(oj,t) s(t) is integrable for every lu G f2 but only for w's in 
some subset of i7 having probability one. Also, even if this issue is addressed, it is 
unclear that the mapping of u) to the result of the integration is a RV. While it is 
clearly a mapping from f2 to the reals, it is unclear that it satisfies the additional 
mathematical requirement of measurability, i.e., that for every £ G K the set 



lo e n 



X(u,t)s(t)dt < £ 



be an event, i.e., an element of T . 

We ask the reader to take it on faith that these issues can be resolved and to focus 
on the relatively straightforward computation of the mean and variance of (25.44). 
The resolution of the measurability issues is provided in Proposition 25.10.1, whose 
proof is recommended only to readers with background in Measure Theory. 

We shall assume throughout that (X(t)) is WSS and that the deterministic function 
s: R — > R is integrable. We begin by heuristically deriving the mean: 



X(t) s(t) At 



E[X(t) s(t)] At 
E[X(t)] s(t)At 
E[X(0)] f s(t)At, 



(25.45) 



with the following heuristic justification. The first equality follows by swapping 
the expectation with the time-integration; the second because s(-) is deterministic; 
and the last equality from our assumption that (X(t)) is WSS, which implies that 
(X(t)) is of constant mean: E[X(t)] = E[X(0)] for all t € R. 

We next heuristically derive the variance of the integral in terms of the autocovari- 
ance function Kxx of the process (X(t)). We begin by considering the case where 
(X(t)) is of zero mean. In this case we have 



Var 



" />oo 




/ t'°° \ 2 




/ X(t)s(t)At 


= E 


/ X(t)s(t)At) 




.J —CO 




\J-oo / 








■ / fOO \ / /.oo \- 




= E 


[J X(t) 3 (t)dt)[J ^X(T) S (r)drj 






" /"CO /"CO 






= E 


/ / X(t)s(t)X(T)s(T)dtdT 








J — oo J — OO 
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/oo /*oo 
/ s(t)s(T)E[X(t)X(T)]dtdT 
-oo J — oo 

s{t)Kxx{t-T)s(T)dtdT, (25.46) 



oo •/ — oo 

OO /"OO 



OO 'J — oo 



where the first equality follows because (25.45) and our assumption that (X(t)j 
is centered combine to guarantee that J X(t) s(t) dt is of zero mean; the second 
by writing a 2 as a times a; the third by writing the product of integrals over K 
as a double integral (i.e., as an integral over R 2 ); the fourth by swapping the 
double-integral with the expectation; and the final equality by the definition of the 
autocovariance function (Definition 25.4.4) and because (X(t)) is centered. 

There are two equivalent ways of writing the RHS of (25.46) that we wish to point 
out. The first is obtained from (25.46) by changing the integration variables from 
{t, t) to (a, t), where a = t — r and by performing the integration first over r and 
then over a: 



Var 



X{t)s{t)dt 



OO /'OO 



-OO J — OO 
OO />CO 



-oo J — oo 

DC 



s(t)K XX {t-T)s{T)dtdT 

s(a + t) K X x {<*) s {t) da dr 

/oo 
s(a + t) s(r)drdcr 
-oo 

K xx {a)R ss {a)da, (25.47) 



where R ss is the self-similarity function of s (Definition 11.2.1 and Section 11.4). 

The second equivalent way of writing (25.46) can be derived from (25.47) when 
(X(t)) is of PSD Sxx- Since (25.47) has the form of an inner product, we can use 
Proposition 6.2.4 to write this inner product in the frequency domain by noting 
that the FT of R ss is / i-» |s(/)| 2 (see (11.35)) and that K X x is the IFT of its 
PSD Sxx- The result is that 



Var 



X(t)s(t)dt 



Sxx (/)(*(/) 1 2 d/. 



(25.48) 



We next show that (25.46) (and hence also (25.47) & (25.48), which are equivalent 
ways of writing (25.46)) remains valid also when (X(t)) is of mean /! (not neces- 
sarily zero). To see this we can consider the zero-mean SP (X(t)} defined at every 
epoch t £ K by X(t) = X(t) — fi and formally compute 



Var 



" />CO 

/ X{t)s(t)dt 


= Var 


" />CO 

/ (X(t) + M)s(*)dt 




.«/— OO 




_«/ — oo 






" />CO />CO 


= Var 


/ X(t)s(t)dt + /J, / s(t)dt 




.J— oo J— oo 




" />CO 




= Var 


/ X(t)s(t)dt 

.J — oo 
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CO /»oo 



s(t)K xx (t- T)s( T )dtdT 
s(t)Kxx(t-T)s(T)dtdT, (25.49) 



xx 

oo -J — oo 

OO fOO 



oo ^ — oo 



where the first equality follows from the definition of X(t) as X(t) — fi; the second 
by the linearity of integration; the third because adding a deterministic quantity 
to a RV does not change its covariance; the fourth by (25.46) applied to the zero- 
mean process yX(t)); and the final equality because the autocovariance function 
of (X(t)) is the same as the autocovariance function of (X(t)) (Definition 25.4.4). 

As above, once a result is proved for centered stochastic processes, its extension 
to WSS stochastic processes with a mean can be straightforward. Consequently, 
we shall often derive our results for centered WSS stochastic processes and leave 
it to the reader to extend them to mean-// stochastic processes by expressing such 
stochastic processes as the sum of a zero-mean SP and the deterministic constant fi. 

As promised, we now state the results about the mean and variance of (25.44) in 
a mathematically defensible proposition. 

Proposition 25.10.1 (Mean and Variance of Linear Functionals of a WSS SP). 

Let (X(t)) be a measurable WSS SP defined over the probability space (0,.F, P) 
and having the autocovariance function Kxx- Let s: R — » R be some deterministic 
integrable function. Then: 

(i) For every uj € SI the mapping t i— > X(u),t) s(t) is Lebesgue measurable, 
(ii) The set 

Af=lu)€n: \X(u!,t)s(t)\dt = oo\ (25.50) 

is an event and is of probability zero. 
(Hi) The mapping from Cl\J\f to R defined by 

X(u,t)s(t)dt (25.51) 

' — oo 

is measurable with respect to T ■ 
(iv) The mapping from f2 to R defined by 

X{ui,t)s(t)dt xtw<£Af, 
v otherwise, 

defines a random variable, 
(v) The mean of this RV is 




(25.52) 



/CO 
s(t)dt. 
-OC 
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(vi) Its variance is 



oo />oo 



s(t)K XX (t-T)s(T)dTdt, 



— oo <J — oo 



which can also be expressed as 



Kxx(CT)R ss ( t r)da, 



(25.53) 



(25.54) 



where R ss is the self- similarity function ofs. 
(vii) If (X(£)) is of PSD Sxx, then the variance of this RV can be expressed as 



Sxx(f)\s(f)\ df. 



(25.55) 



Proof. Part (i) follows because the measurability of the process yX(t)j guarantees 
that for every w € O the mapping t t—> X(LU,t) is Borel measurable and hence a 
fortiori Lebesgue measurable; see (Billingsley, 1995, Chapter 3, Section 18, Theo- 
rem 18.1 (ii)). 

If s happens to be Borel measurable, then Parts (ii)-(v) follow directly by Fubini's 
Theorem (Billingsley, 1995, Chapter 3, Section 18, Theorem 18.3) because in this 
case the mapping {uJ,t) i— » X(u>,t) s(t) is measurable (with respect to the product 
of T by the Borel cr-algebra on the real line) and because 



E[|X(i)s(i)|] At 



E[\X{t)\\\s(t)\dt 



< y/E[X*(0)] / \s(t)\dt 

J — oo 

< oo, 

where the first inequality follows from (25.16), and where the second inequality 
follows from our assumption that s is integrable. 

To prove Parts (i)-(v) for the case where s is Lebesgue measurable but not Borel 
measurable, recall that every Lebesgue measurable function is equal (except on 
a set of Lebesgue measure zero) to a Borel measurable function (Rudin, 1974, 
Chapter 7, Lemma 1), and note that the RHS of (25.50) and the mappings in 
(25.51) and (25.52) are unaltered when s is replaced with a function that is identical 
to it outside a set of Lebesgue measure zero. 

We next prove Part (vi) under the assumption that (X(t)) is centered. The more 
general case then follows from the argument leading to (25.49). To prove Part (vi) 
we need to justify the steps leading to (25.46). For the reader's convenience we 
repeat these steps here and then proceed to justify them. 



Var 



X(t)s(t)At 



X(t)s(t)At 
X{t)s(t)dt 



X(t)s(t)At 
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CO />CO 



CO J —CO 
CO />CO 



CO />CO 



X{t)s(t)X{T)s(T)dtdT 
s{t)s{ T )E[X{t)X{T)] dtdr 
s{t)K xx {t-T)s{ T )dtdT. 



oo <f — oo 



The first equality holds because for centered processes, by Part (v), the RV on the 
LHS is of zero mean; the second follows by writing a 2 as a times a; the third follows 
because for ui's satisfying J \X(io,t) s{t)\ dt < oo we can use Fubini's Theorem to 
replace the iterated integrals with a double integral and because other w's occur 
with zero probability and therefore do not influence the expectation; the fourth 
equality entails swapping the expectation with the integration over R 2 and can be 
justified by Fubini's Theorem because, by (25.17), 

oo />oo />oo />oo 

/ \ S (t) S (T)\E[\X(t)X(T)\}dtdT<K XX (0) / \ S (t)\\ S (T)\dtdT 

-co J — CO J — CO J — CO 

= Kxx(0) IN 2 - 
< oo; 

and the final equality follows from the definition of the autocovariance function 
(Definition 25.4.4). 

Having derived (25.53) we can derive (25.54) by following the steps leading to 
(25.47). The only issue that needs clarification is the justification for replacing 
the integral over K 2 with the iterated integrals. This is justified using Fubini's 
Theorem by noting that, by (25.15), | Kxx(c)| < Kjo:(0) and that s is integrable: 



s{t)\ \s{a + T)K xx {a)\ da dr <K xx {0) \s(t)\ \s{a + t)\ dadr 

J— oo J— CO •'— CO 

= Kxx(0) ||s|| 2 
< oo. 

Finally, Part (vii) follows from (25.54) and from Proposition 6.2.4 by noting that, 
by (11.34) & (11.35), R ss is integrable and of FT 

Rss(/) = |5(/)| 2 , /el, 

and that, by Definition 25.7.2, if S xx is the PSD of (X(t)), then S xx is integrable 
and its IFT is K xx , i.e., 

Kxx(a)= [ S xx (f)e i2 *f° df. □ 

J — OO 

Note 25.10.2. 

(i) In the future we shall sometimes write 

X{t)s(t)dt 
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instead of the mathematically more explicit (25.52) & (25.50). Sometimes, 
however, we shall make the argument u) € SI more explicit: 

,. ,,,, , , , X(u,t)a(t)dt if/_°°JX(u;,i) 5 (i)|di<oo, 

X(t)s(t)dt)(u) = \ J -co 

otherwise. 

(ii) If Si and S2 are indistinguishable integrable real signals (Definition 2.5.2), 
then the random variables J_ X(t)si(t) dt and J_ X(t)s2(t) dt are identi- 
cal. 

(iii) For every ciGR 

/>oo /*oo 

X(t)(as(t))dt = a X(t)s(t)dt. (25.56) 

J — oo 

(iv) We caution the very careful readers that if Si and S2 are integrable func- 
tions, then there may be some a/s in f2 for which the stochastic integral 
(J_ X(t) (si(t) + S2(t))dt)(oj) is not equal to the sum of the stochastic 
integrals (/^ X(t) Si(£) dt)(u>) and (/^ X(t) s 2 (t) dt)(u>). This can hap- 
pen, for example, if the trajectory t i— > X(u),t) corresponding to lu is such 
that either J \X(u>,t) «i(t)| dt or J \X(u>, t) S2 (t) | dt is infinite, but not both. 
Fortunately, as we shall see in Lemma 25.10.3, such a>'s occur with zero prob- 
ability. 

(v) The value that we have chosen to assign to the integral in (25.52) when lu is 
in J\f is immaterial. Such tu's occur with zero probability, so this value does 
not influence the distribution of the integral. 7 

Lemma 25.10.3 ("Almost" Linearity of Stochastic Integration). Let (X(t)) be 
a measurable WSS SP, let Si, . . . , s TO : R — > R be integrable, and let 71, . . . , j m be 
real. Then the random variables 

v»(j_ X(t)(f>^(*))diJM (25.57) 

and 

^X>((/ *(*)*j(t)dt)(a;)j (25.58) 

differ on at most a set of ui's of probability zero. In particular, the two random 
variables have the same distribution. 

Note 25.10.4. In view of this lemma we shall write, somewhat imprecisely, 

/>oo />oo 

X(t)(ais 1 (t) + a 2 S2(t))dt = ai / X(t) s 1 (t)dt + a 2 X{t)s 2 {t)dt. 



7 The value zero is convenient because it guarantees that (25.56) holds even for lj's for which 
the mapping t 1— > X(u>,t) s(t) is not integrable. 
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Proof of Lemma 25.10.3. Let (f2, !F, P) be the probability space over which the 
SP (X(t)) is defined. Define the function 

rn 

s :t^^2"fjSj(t) (25.59) 

and the sets 

Afj = <u €tl: \X(w,t)Sj(t)\dt = oo>, j = 0,l,...,m. 

By (25.59) and the Triangle Inequality (2.12) 

-rn 

\X(u,t)s (t)\ <J2hj\\X(u,t)sj(t)\, cjefl, t&R, 

3 = 1 

which implies that 

m 

3 = 1 

By the Union Bound (or more specifically by Corollary 21.5.2 (i)), the set on the 
RHS is of probability zero. The proof is concluded by noting that, outside this 
set, the random variables (25.57) and (25.58) are identical. This follows because, 
for w's outside this set, all the integrals are finite so linearity holds. □ 

25.11 Linear Functionals of Gaussian Processes 

We continue our discussion of integrals of the form J X(t) s(t) dt, but this time with 
the additional assumption that (X(t)) is Gaussian. The main result of this section 
is Proposition 25.11.1, which states that, subject to some technical conditions, the 
result of this integral is a Gaussian RV. In fact, Proposition 25.11.1 is a bit more 
general and addresses expressions of the form 

n 

X(t)s{t)dt + ^2<x v X{t v ), (25.60) 

where [X[ty\ is a stationary Gaussian process, s: R — > R is integrable, n is an 
arbitrary nonnegative integer, and the coefficients ai, . . . ,a n G R and the epochs 
ti,...,t n G R are arbitrary. It shows that, subject to the additional technical 
condition that (X(t)) is measurable, the result of (25.60) is a Gaussian RV. Con- 
sequently, its distribution is fully specified by its mean and variance, which, as we 
shall see, can be easily computed from the autocovariance function Kxx- 

The proof of the Gaussianity of (25.60) (Proposition 25.11.1 ahead) is technical, 
so we encourage the reader to focus on the following heuristic argument. Suppose 
that the integral is a Riemann integral and that we can therefore approximate it 
with a finite sum 



/oo K 

X(t)s(t)dttt Y^ SX{Sk)s(Sk) 
-CO j, i/ 



k=-K 
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for some large enough K and small enough 6 > 0. (Do not bother trying to sort 
out the exact sense in which this approximation holds. This is, after all, a heuristic 
argument.) Consequently, we can approximate (25.60) by 

/oo n K n 

X{t)s{t)dt + ^a v X(t v )zi Yl Ss(5k)X(5k) + J2a u X(U). (25.61) 

•°° v=\ fc=-K i/=l 

But the RHS of the above is just a linear combination of the random variables 

X(-K^,...,A-(Ktf),X(*i),- ■•,*(**)> 

which are jointly Gaussian because (X (£)) is a Gaussian SP. Since a linear func- 
tional of jointly Gaussian random variables is Gaussian (Theorem 23.6.17), the 
RHS of (25.61) is Gaussian, thus making it plausible that its LHS is also Gaussian. 

Before stating the main result of this section in a mathematically defensible way, 
we now proceed to compute the mean and variance of (25.60). We assume that s(-) 
is integrable and that yX(t)j is measurable and WSS. (Gaussianity is inessential 
for the computation of the mean and variance.) The computation is very similar 
to the one leading to (25.45) and (25.46). For the mean we have: 

■ />oo n "I r /"oo "I n 

E / X{i)s{t)dt + ^a v X(t v ) =E / X(t)s(t)dt + ^ a v E[X{t v )] 

J-oo „ =1 J L/-00 J v=1 

(/•OO n \ 

/ 8(t)dt + ^2a v ), (25.62) 

where the first equality follows from the linearity of expectation and where the 
second equality follows from (25.45) and from the wide-sense stationarity of yX(t)J , 
which implies that E[X(t)} = E[X(0)], for all t G R. 

For the purpose of computing the variance of (25.60), we assume that (X(t)) is 
centered. The result continues to hold if (X(t)) has a nonzero mean, because the 
mean of (X(t)) does not influence the variance of (25.60). We begin by expanding 
the variance as 



Var 



/oo n i r /> 

X{t)s{t)dt + ^a v X(t v ) =Var / 
-OO i J — 



X(t)s(t)dt 



Var 



71, Ti r n QQ 

^a v X{t v ) +2^ aj/ Cov / X 

1 J 1 \J — OO 



:{t)s{t)dt,x(u) 



(25.63) 



and by noting that, by (25.47), 



Var 

and that, by (25.24), 
Var 



X{t)s(t)dt 



Kxx(ff)R ss ((T)d f 7 



y^a„X(tv) = y^ y^ a v a v > K X x(t v - t v >). 



(25.64) 



(25.65) 



v=\v' = \ 
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To complete the computation of the variance of (25.60) it remains to compute the 
covariance in the last term in (25.63): 



X{t„) 



X(t)s(t)dt 



X(t)X(U)s(t)dt 

CO 

s(t)E[X(t)X(t u )}dt 
s(t) Kxx(t-t v )dt. 
Combining (25.63) with (25.64)-(25.66) we obtain 



(25.66) 



Var 



/CO n "I />CO 

X(t)s(t)dt + Y,^X(t l/ ) = K xx (a) Rss(a)dc7 

-OO 1/ — 1 ^ °° 

n n n „oo 

^ X] av<Xv ' K xx{t^ -U>) + 2^2a v / s(t) K X x{t- t v 



)dt. (25.67) 



We now state the main result about linear functionals of Gaussian stochastic pro- 
cesses. The proof is recommended for mathematically- inclined readers only. 

Proposition 25.11.1 (Linear Functional of Stationary Gaussian Processes). Con- 
sider the setup of Proposition 25.10.1 with the additional assumption that the pro- 
cess (X(t)) is Gaussian. Additionally introduce the coefficients a\, . . . , a n € M. and 
the epochs t\, ■ ■ ■ ,t n € R for some n £ N. Then there exists an event M £ T of 
zero probability such that for all w *£ J\f the mapping t t— > X(w, t) s(t) is a Lebesgue 
integrable function: 

(25.68a) 



(25.68b) 



I the mapping 1 1— > X(oj, t) s(t) is in Ci ) , io (fc Af, 

and the mapping from SI to R 

' /-oo n 

/ X(u),t)s(t)dt + ^2a 1/ X(ui,t v ) iiuj^Af, 

J —OO i 



v=\ 



otherwise 

is a Gaussian RV whose mean and variance are given in (25.62) and (25.67). 

Proof. We prove this result when (X(t)) is centered. The extension to the more 
general case follows by noting that adding a deterministic constant to a zero-mean 
Gaussian results in a Gaussian. We also assume that s(-) is Borel measurable, 
because once the theorem is established for this case it immediately also extends 
to the case where s(-) is only Lebesgue measurable by noting that every Lebesgue 
measurable function is equal almost everywhere to a Borel measurable function. 

The existence of the event J\f and the fact that the mapping (25.68b) is a RV follow 
from Proposition 25.10.1. We next show that the RV 



Y(w) 



/oo " 

X(u),t)s(t)dt + ^2a v X(u!,t v ) ifw£ 
-CO ,._, 



otherwise, 



(25.69) 
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is Gaussian. 

To that end, define for every k G N the function 

f s(t) if Itl < k and \s(t)\ < Vk, 
s k {t) = < W ' " I Wl - v , f eR ^ (25.70) 

10 otherwise. 

Note that for every lu € il 

lim X(w, i) Sfe(t) = X(w, i) s(i), t e R, 

k — >oo 

and 

\X(u,t)s k (t)\ <\X(u,t)s(t)\, tet, 

so, by the Dominated Convergence Theorem and (25.68a), 

/oo />oo 

X{uj,t)s k (t)dt= X(u>,t)s(t)dt, uj^Af. (25.71) 

-oo J-oc 

Define now for every k G N the RV 

' />oo 



Y k {io) = I 



/ _ _ 

X{u,t)s k {t)At + ^a v X{w,t v ) iiu^Af, 

■°° v=\ 

otherwise. 



(25.72) 



It follows from (25.71) that the sequence Y\,Y<2,,. . . converges almost surely to Y. 
To prove that Y is Gaussian, it thus suffices to prove that for every k € N the RV 
Y k is Gaussian (Theorem 19.9.1). 

To prove that Y k is Gaussian, we begin by showing that it is of finite variance. To 
that end, it suffices to show that the RV 

Y k{ «)±U-~ X ^ S ^ dt lf ^< (25.73) 

I otherwise 

is of finite variance. We prove this by using the definition of s k (-) (25.70) and by 
using the Cauchy-Schwarz Inequality to show that for every uj <£ J\f 

Y k 2 (uj)= | / X(Lo,t), k (t)dt* 

-oo 

,k N 2 

X(u),t)s k (t)dt 

k 

/k />k 

X 2 {oj,t)dt / s 2 k {t)dt 
-k J-k 

r-k \ 

: ! / X 2 (uj,t)dt 2fc 2 , 



-k 






where the equality in the first line follows from the definition of Y k (25.73); the 
equality in the second line from the definition of Sfc(-) (25.70); the inequality in the 
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third line from the Cauchy-Schwarz Inequality; and the final inequality again by 
(25.70). Since J\f is an event of probability zero, it follows from this inequality that 

E[f fc 2 ] <4fc 3 K xx (0)<oo, 

thus establishing that Y k , and hence also Y k , is of finite variance. 

To prove that Y k is Gaussian we shall use some results about the Hilbert space 
L 2 (fJ,.F, P) of (the equivalence classes of) the random variables that are defined 
over (fi, T , P) and that have a finite second moment; see, for example, (Shiryaev, 
1996, Chapter II, Section 11). Let Q denote the closed linear subspace of L 2 (f2, T , P) 
that is generated by the random variables (X(i), t £l). Thus, Q contains all fi- 
nite linear combinations of the random variables (X(t), t 6 l) as well as the 
mean-square limits of such linear combinations. Since the process (X(t), ( 6 R) 
is Gaussian, it follows that all such linear combinations are Gaussian. And since 
mean-square limits of Gaussian random variables are Gaussian (Theorem 19.9.1), 
it follows that Q contains only random variables that have a Gaussian distribu- 
tion (Shiryaev, 1996, Chapter II, Section 13, Paragraph 6). To prove that Y k is 
Gaussian it thus suffices to show that it is an element of Q . 



To prove that Y k is an element of Q , decompose Y k as 



Y k 



yG 



Y, 



k > 



(25.74) 



where Y k is the projection of Y k onto Q and where Y k is consequently perpendic- 
ular to every element of Q and a fortiori to all the random variables (X(t), t G K): 



E[x(t)y fc L ] =0, tern.. 

Since Yj. is of finite variance, this decomposition is possible and 

E[(^) 2 ],E[(^) 2 ]<oo. 

To prove that Yfc is an element of Q we shall next show that E (Y^-\ 
equivalently (in view of (25.74)), that 

E[y fc y fc - L ]=o. 

To establish (25.77) we evaluate its LHS as follows: 

" / f'OO n \ 

E[Y k Y k ^} = E / X(t)8 k (t)dt + ^2a v X(U))Y^- 

n 

^a,E[X(^)Y^] 



(25.75) 



(25.76) 



or, 



(25.77) 



X(t)s k (t)dt Y k 



X(t)8 k (t)dt)Yjt- 
E[X{t)8 k {t)Yf] dt 
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E[X(t)Y k ^} Sk (t)dt 

o 
= 0, 

where the first equality follows from the definition of Y& (25.72); the second from 
the linearity of expectation; the third from the orthogonality (25.75); the fourth by 
an application of Fubini's Theorem that we shall justify shortly; the fifth because 
Sfc(-) is a deterministic function; and the final equality again by (25.75). This 
establishes (25.77) subject to a verification that the conditions of Fubini's Theorem 
are satisfied, a verification we conduct now. That (oJ,t) >—> -X"(u>,£)Y" fe ~ L (a;) s k (t) is 
measurable follows because (X(t), i G l) is a measurable SP; Y^~, being a RV, 
is measurable with respect to J 7 ; and because the Borel measurability of s(-) also 
implies the Borel measurability of Sfc(-). The integrability of this function follows 
from the Cauchy-Schwarz Inequality for random variables 



E[|X(i)l^|] \s k (t)\dt< f°° ^ElX^mM{y^i\sk(t)\dt 

) "' J — OO V 

< 00, 

where the second inequality follows from the definition of s k (-) (25.70), and where 
the third inequality follows from (25.76). This justifies the use of Fubini's Theorem 
in the proof of (25.77). We have thus demonstrated that Y k is in Q ', and hence, like 
all elements of Q, is Gaussian. This concludes the proof of the Gaussianity of Y k 
for every fceN and hence the Gaussianity of Y. 

It only remains to verify that the mean and variance of Y are as stated in the 
theorem. The only part of the derivation of (25.67) that we have not yet justified 
is the derivation of (25.66) and, in particular, the swapping of the expectation and 
integration. But this is easily justified using Fubini's Theorem because, by (25.17), 

/oo 
E[\X(t„)X(t)\] \s(t)\dt< (K^OHELYtO)] 2 )^!!, < oo. (25.78) 

-oo 

□ 

Proposition 25.11.1 is extremely powerful because it allows us to determine the 
distribution of a linear functional of a Gaussian SP from its mean and variance. 
In the next section we shall extend this result and show that any finite number of 
linear functionals of a Gaussian SP are jointly Gaussian. Their joint distribution 
is thus fully determined by the mean vector and the covariance matrix, which, as 
we shall see, can be readily computed from the autocovariance function. 

25.12 The Joint Distribution of Linear Functionals 

Let us now shift our focus from the distribution of a single linear functional to the 
joint distribution of a collection of such functionals. Specifically, we consider m 
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functionals 

/oo "J 

X(t) *j- (t) dt + ^ Oj>X (*,>), j = l,...,m (25.79) 

•oo I/=l 

of the measurable, stationary Gaussian SP (X(i)j. Here the m real-valued sig- 
nals Si, . . . , s m are integrable, ni, . . . , n m are in N, and OLj t „, tj fV are deterministic 
constants for all v £ {1, . . . , rij}. 

The main result of this section is that if (X(t)) is a Gaussian SP, then the random 
variables in (25.79) are jointly Gaussian. 

Theorem 25.12.1 (Linear Functionals of a Gaussian SP Are Jointly Gaussian). 

The m linear functionals 



/oo n i 
X(t) s &) dt + J2 <Xj, v X(t jtV ), j = l,.. 
-OO i 



of a measurable, stationary, Gaussian SP (X(t)) are jointly Gaussian, whenever 
m £ N; the m functions {sj}'p =l are integrable functions from K to R; the inte- 
gers {rij} are nonnegative; and the coefficients {ccj )L/ } and the epochs {tj >v } are 
deterministic real numbers for all j £ {1, . . . , m} and all v £ {1, . . . , rij}. 

Proof. It suffices to show that any linear combination of these linear function- 
als has a univariate Gaussian distribution (Theorem 23.6.17). This follows from 
Proposition 25.11.1 and Lemma 25.10.3 because, by Lemma 25.10.3, for any choice 
of the coefficients 71 , . . . , "f m £ M. the linear combination 



7i, 

-OO 



/OO "-1 
X{t) Sl {t)dt + Y,ai,„X(t hl ,] 
OO 1 

/OO n -m 
X(t) s m (t) dt + Y, a m ,vX(t m ,v) 
-OO 1 



+ 1-n 



has the same distribution as the linear functional 

/oo / ra -. m rij 

X(t) $>, Sj(t) dt + ^J^a^-rfa*), 

-°° V j = l ' j=lu=l 

which, by Proposition 25.11.1, has a univariate Gaussian distribution. □ 

It follows from Theorem 25.12.1 that if [X(t)) is a measurable, stationary, Gaussian 
SP, then the joint distribution of the random variables in (25.79) is fully specified 
by their means and their covariance matrix. If (X (£)) is centered, then by (25.62) 
these random variables are centered, so their joint distribution is determined by 
their covariance matrix. We next show how this covariance matrix can be computed 
from the autocovariance function Kxx- To this end we assume that (X(t)) is 
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centered, and expand the covariance between any two such functionals as follows: 

/OO n i I-OO n k 

X{t) Sj (t) dt + Y, <Xj, v X(t jlV ), / X{t) s k {t) dt+Y, <x k yX{t k y) 

/CO f'OQ 

X(t)Sj(t)dt, / X(t)s k (t)dt 
-co ^ — CO 

U 3 r /*°° 

+ Y a j,v C °v x(t jlV ), / A-(t)« fc (t)d« 

n fc /»CO 

+ X! a fe^' Cov *(*k,«/')> / A"(t)sj-(t)d< 

+ 5^5^a J> a fc , w /Cov[X(t J> ),A'(t fciI/ /)] ) j, fc e {1, . . . ,m}. (25.80) 

The second and third terms on the RHS can be computed from the autocovariance 
function Kxx using (25.66). The fourth term can be computed from Kxx by noting 
that Cov[X (tj lV ),X(tk,v')] = Kxx(tjM — tk,u') (Definition 25.4.4). We now evaluate 
the first term: 

Cov 



X(t)sj{t)dt, / X(t)s k (t)dt 



X(i)Sj(i)dt / X{T)s k (r)dT 

J — CO 
CO fOO 

X(t)Sj(t)X(T)s k (T)dtdT 



CO /"CO 



-co J —CO 
CO /"CO 



E[X(t) X(t)} Sj (t) s k (r) dtdr 
Kxx(*-r) *,•(*) «fc(r)dtdT, (25.81) 






which is the generalization of (25.53). By changing variables from (t,r) to (t,<j), 
where a = t — r, we can obtain the generalization of (25.54). Starting from (25.81) 



Cov 



X(t)sj(t)dt, / X(i)s fc (i)di 



CO /*oo 






Kjfx(t-r)aj-(t)*fc(r)dtdT 

/CO 
s J (t)s fe (t-cr)didcr 
-CO 
/CO 
s i (t)s fe (fj- t)did<T 
-CO 

K xx (v)(s 3 *Sk)(<T)da. (25.82) 



If (-X'(i)J is of PSD Sxx, then we can rewrite (25.82) in the frequency domain 
using Proposition 6.2.4 in much the same way that we rewrote (25.46) in the form 

(25.48): 



Cov 



X(t)sj(t)dt, / X(t)s k (t)dt 



$xx(f)8j(f)K(f)df, (25.83) 
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where we have used the fact that the FT of Sj * s^ is the product of the FT of Sj 
and the FT of S&, and that the FT of s^ is / i— > Sk(— /), which, because s^ is real, 
is also given by / i— > s^(/). 

The key second-order properties of linear functionals of measurable WSS stochastic 
processes are summarized in the following theorem. Using these properties and 
(25.80) we can compute the covariance matrix of the linear functionals in (25.79), 
a matrix which fully specifies their joint distribution whenever (X(t)) is a centered 
Gaussian SP. 

Theorem 25.12.2 (Covariance Properties of Linear Functionals of a WSS SP). 

Let (X(t)) be a measurable WSS SP. 



(i) If the real signal s is integrable, then 



Var 



X{t)s(t)dt 



/CO 
Kxx(ff)R ss (ff)dfi, 
-co 



(25.84) 



where R ss is the self- similarity function of s. Furthermore, for every fixed 
epoch TGl 



Cov 



/OO /"OO 

X(t)s(t)dt,X(T) = S (t)K xx (T-t)dt, re 

- CO J — OO 



(25.85) 



If Si,S2 are real-valued integrable signals, then 



Cov 



" />00 f'OQ />00 

/ X(t) Sl (t)dt, X(t)s 2 (t)dt = K xx (o-)( Sl *s 2 )(o-)do-. 

.J— OO J— CO J J— CO 

(25.8' 
(ii) If [X(t)j is of PSD 5 XX , then for s, si, S2, and r as above 







Var 


r x(t) S (t) 

J — OO 


it 


/•CO 

•/ — oc 


S X x(/)|s(/)| 2 d/, 


(25.87) 


Cov 


" />CO 

/ X(t)s(t)dt,X(r) 


= p S xx (f)s(f)e' 2 ^df, 


(25.88) 


and 




Cov 


[/- 


CO 

X(t)6 

OO 


/"CO 

i(*)dt, / X( 

•^ — CO 


t) 


s 2 (t)dt 


= fs H (f)h(f) 

J -co 


m)df. 



(25.89) 

Proof. Most of these claims have already been proved. Indeed, (25.84) was proved 
in Proposition 25.10.1 (vi), and (25.85) was proved in Proposition 25.11.1 using 
Fubini's Theorem and (25.78). However, (25.86) was only derived heuristically 
in (25.81) and (25.82). To rigorously justify this derivation one can use Fubini's 
Theorem, or use the relation 

Cov[X, Y] = - (Var[X + Y] - Var[X] - Var[yf 



and the result for the variance, namely, (25.84). 

All the results in Part (ii) of this theorem follow from the corresponding results in 
Part (i) using the definition of the PSD and Proposition 6.2.4. □ 
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(*(*)) 



h(- 



(X(t))*h 



Figure 25.2: Passing a SP (X(t)) through a stable filter of impulse response h(-). 
If (X(t)) is measurable and WSS, then so is the output (X(t)) *h. If, additionally, 

(X{t)) is of PSD S XX , then the output is of PSD / i-> S X x{f) IM/)| 2 - If (X{t)) is 
additionally Gaussian, then so is the output. 



25.13 Filtering WSS Processes 

We next discuss the result of passing a WSS SP through a stable filter, i.e., the 
convolution of a SP with a deterministic integrable function. Our main result is 
that, subject to some technical conditions, the following hold: 

(i) Passing a WSS SP through a stable filter produces a WSS SP. 

(ii) If the input to the filter is of PSD Sxx, then the output of the filter is of 
PSD / i— > Sxx{f) IM/)| 2 , where h{) is the filter's frequency response. 

(iii) If the input to the filter is a Gaussian SP, then so is the output. 

We state this result in Theorem 25.13.2. But first we must define the convolution 
of a SP with an integrable deterministic signal. Our approach is to build on our 
definition of linear functionals of WSS stochastic processes (Section 25.10) and to 
define the convolution of (X(t)) with h(-) as the SP that maps every epoch t€K 
to the RV 

fOQ 

X(a)h(t-a)da, 



where the above integral is the linear functional 

X{a) s{&) da with s: a i— » h(t — a). 

With this approach the key results will follow by applying Theorem 25.12.2 with 
the proper substitutions. 

Definition 25.13.1 (Filtering a Stochastic Process). The convolution of a mea- 
surable, WSS SP (X(t)j with an integrable function h: R — > M. is denoted by 

{X{t))*h 

and is defined as the SP that maps every t G R to the RV 

X(a)h{t-a)da, (25.90) 
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where the stochastic integral in (25.90) is the stochastic integral that was defined 
in Note 25.10.2 

f I™ X( U ,<r)h{t-a)da if f?JX{u,a) h(t - a)\ da < oo, 
[0 otherwise. 

Theorem 25.13.2. Let (Y(t)) be the result of convolving the measurable, cen- 
tered, WSS SP (X(t)) of autocovariance function Kxx with the integrable function 

(i) The SP (Y(t)) is centered, measurable, and WSS with autocovariance func- 
tion 

Krr = K X x*Rhh, (25.91) 

where Rhh * s the self-similarity function of h (Section 11. 4). 
(ii) If(X(t)) is of PSD S xx , then (Y(t)) is of PSD 

S YY (f)=\h(f)\ 2 Sxx(f), feR. (25.92) 

(Hi) For every i,T£l, 

E[X(t)Y(t+T)} = (K xx *b)(T), (25.93) 

where the RHS does not depend on t. s 

(iv) If (X(t)) is Gaussian, then so is (Y(t)). Moreover, for every choice of 
n, m e N and for every choice of the epochs ti,...,t n , t n +i, . . . , t n+m G K, 
the random variables 

X(ti), ..., X(t n ),Y(t n+1 ), ..., Y(t n+m ) (25.94) 

are jointly Gaussian. 9 

Proof. For fixed (,r€lwe use Definition 25.13.1 to express Y(t) and Y(t + r) as 

/oo 
X(cr)s 1 (o-)dcr, (25.95) 

-oo 

and 



Y(t + r)= X{a)s 2 (a)da, (25.96) 

J — oo 

where 

si: (tm h(t-a), (25.97) 

s 2 : cr h-» h(t + T- a). (25.98) 



8 Two stochastic processes (X(i)) and (Y(t)) are said to be jointly wide-sense stationary 
if each is WSS and if E[X(t)Y(t + r)] does not depend on t. 

9 That is, (X(t)) and (Y(t)) are jointly Gaussian processes. 
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We are now ready to prove Part (i). That (Y(t)j is centered follows from the 
representation of Y(t) in (25.95) & (25.97) as a linear functional of (X(t)) and 
from the hypothesis that (X(t)) is centered (Proposition 25.10.1). 

To establish that \Y(t)) is WSS we use the representations (25.95)-(25.98) and 
Theorem 25.12.2 regarding the covariance between two linear functionals as follows. 



Cov[Y(t + T),Y(t)} = Cov / 



X((r)s 2 (a)d<7, 



X{a)s l {a)da 



Kxx(ff) (s 2 *si)(<7)dcr, 



(25.99) 



where the convolution can be evaluated as 



(s 2 *Si)(ct) 



s 2 (/i)si(cr - /i)d^t 

h(t + t — /i) h(t + a - fj,) dfi 

h(ji + t - a) h{jl)djl 



(25.100) 



(25.101) 



= R h h(T- a), 

where jj, = t + a — fj,. Combining (25.99) with (25.100) yields 

Cov[Y(t + T),Y(t)} = (K xx *R hh )(r), t,T SR, 

where the RHS does not depend on t. This establishes that (Y(t)) is WSS and 
proves (25.91). 10 

To conclude the proof of Part (i) we now show that (Y(t)) is measurable. The proof 
is technical and requires background in Measure Theory. Readers are encouraged 
to skip it and move on to the proof of Part (ii). 

We first note that, as in the proof of Proposition 25.10.1, it suffices to prove the 
result for impulse response functions h that are Borel measurable; the extension 
to Lebesgue measurable functions will then follow by approximating h by a Borel 
measurable function that differs from it on a set of Lebesgue measure zero (Rudin, 
1974, Chapter 7, Lemma 1) and by then applying Part (ii) of Note 25.10.2. We 
hence now assume that h is Borel measurable. 

We shall prove that (Y(t)) is measurable by proving that the (nonstationary) 
process (ui,t) i— * Y(uj,t)/(1 + t 2 ) is measurable. This we shall prove using Fubini's 
Theorem applied to the function from (fl x R) x R to R defined by 

X(u>,a)h{t-a) 



(MM 



i + t 2 



(u>,t) e o x 



a e 



(25.102) 



This function is measurable because, by assumption, (X(t)) is measurable and 
because the measurability of the function h(-) implies the measurability of the 



10 That (Y(t)) is of finite variance follows from (25.101) by setting r = and noting that 
the convolution on the RHS of (25.101) is between a bounded function (Kxx) and an integrable 
function (Rhh) an d i s thus defined and finite at every r £ K and a fortiori at r = 0. 
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function (t, a) i— > h(t — a) (as proved, for example, in (Rudin, 1974, p. 157)). We 
next verify that this function is integrable. To that end, we first integrate its 
absolute value over (u),t) and then over er. The integral over (u>,t) is given by 

-oo i- + t J t —_ 00 1 + t 

where the inequality follows from (25.16) and from our assumption that yX(t)) is 
centered. We next need to integrate the RHS over a. Invoking Fubini's Theorem 
to exchange the order of integration over t and a we obtain that the integral of the 
absolute value of the function defined in (25.102) is upper-bounded by 



1 + t Jt=-oo J <y=-oa 1 + t 

Ihll 



= ny/K xx (0)\\h\\ 1 

< OO. 

Having established that the function in (25.102) is measurable and integrable, we 
can now use Fubini's Theorem to deduce that its integral over a is measurable as 
a mapping of (u),t), i.e., that the mapping 

, N f°° X(u),a)h(t-a) 

(w,t -» / l \' { +2 '-da 25.103 

is measurable. Since the RHS of (25.103) is Y(w,t)/(1 + i 2 ), we conclude that the 
mapping (u),t) i— > Y(u>,t)/(1 + 1 2 ) is measurable and hence also (u,t) i— > Y"(w,t). 

We next prove Part (ii) using (25.91) and Proposition 6.2.5. Because h is integrable, 
its self-similarity function Rhh is integrable and of FT 

Rhh(/) = |M/)| 2 , /GR (25.104) 

(Section 11.4). And since, by assumption, (X(t)) is of PSD Sxx, it follows that Sxx 
is integrable and that its IFT is \^xx'- 

/oo 
Sxx(f)e' 27TfT df, TGl. (25.105) 

Consequently, by Proposition 6.2.5, 

/oo 
|^(/)| 2 Sxx(/)e i27r/T d/, rel. 
-oo 

Combining this with (25.91) yields 

Kyy(t)= [ \h(f)\ 2 Sxx(f)e i27 ' fT df, rGl, 
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and thus establishes that the PSD of (Y(t)j is as given in (25.92). 

We next turn to Part (iii). To establish (25.93) we use the representation (25.96) 
& (25.98) and Theorem 25.12.2: 

/>CO 

E[X(t)Y(t + t)] = Cov X(t), / X{a) s 2 (a) da 

J — CO 

/CO 
s 2 (er) K X x{t- cr)dcr 
-co 

/CO 
h{t + T-a)Kxx{t-(j)da 
-CO 

Kxx{-^)h{T - n)&n 



Kxx(m) h(r - ^)d^ 
= (K xx *h)(r), rel, 

where /i = a — t, and where we have used the symmetry of the autocovariance 
function. 

Finally we prove Part (iv). The proof is a simple application of Theorem 25.12.1. 
To prove that (Y(t)) is a Gaussian process we need to show that, for every pos- 
itive integer n and for every choice of the epochs t\,...,t n , the random variables 
Y(t\), . . . , Y(t n ) are jointly Gaussian. This follows directly from Theorem 25.12.1 
because Y(t„) can be expressed as 



Y{t v )= / X(a)h(tv-a)da 

J — oo 

/oo 
X{a)s v {a) da, v =l,...,n, 
-oo 

where 

s v : a i-> h{t v — a), v=l,...,n 

are all integrable. 

The joint Gaussianity of the random variables in (25.94) can also be deduced from 
Theorem 25.12.1. Indeed, X(t v ) can be trivially expressed as the functional 

/oo 
X(a)s v (<j)da + a v X(t v ), i/=l,...,n 
-oo 

when s^ is chosen to be the zero function and when a„ is chosen as 1, and Y(t v ) 
can be similarly expressed as 



Y(t u )= I X(a) s u (cr)&<7 + a v X(t v ), v = n+ 1, . . . ,n + m 

J — oo 

when s^ : a i— > h{t v — a) and ot v = 0. □ 
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The mathematically astute reader may have noted that, in defining the result of 
passing a WSS SP through a stable filter of impulse response h, we did not preclude 
the possibility that for every u> there may be some epochs t for which the mapping 
g \— > X(u>, <j) h(t — <j) is not integrable. So far, we have only established that 
for every epoch t the set Mt of a/s for which this mapping is not integrable is of 
probability zero. 

We next show that if h is well-behaved in the sense that it is not only integrable 
but also satisfies 

fOO 

h 2 (t)(l + t 2 )At < oo, (25.106) 



then whenever uj is outside some set M C CI of probability zero, the mapping 
a t— > X(u>,a) h(t — a) is integrable for all tsK. Thus, for w's outside this set of 
probability zero, we can think of the response of the filter as being the convolution 
of the trajectory t i— > X(ui, i) and the impulse response t t— > h{t). For such w's this 
convolution never blows up. 

We show this in two steps. In the first step we note that if h satisfies (25.106) and 
if the trajectory t v- > X(u),t) satisfies 

' tit < oo, (25.107) 



l + t 2 

then the function a f—» X(uj,a)h{t — a) is integrable for every f e 1 (Proposi- 
tion 3.4.4). 

In the second step we show that outside a set of o/s of probability zero, all the 
trajectories t i— ► X(a>,£) satisfy (25.107): 

Lemma 25.13.3. Let (X(t)) be a WSS measurable SP defined over the probability 
space (CI, T , P) . Then 

X 2 (t) 



1 + i 2 



At 



< oo, (25.108) 



and the set 

X 2 (uj, t) 
l + t 



uj eCl: J - [ 2 At < oo } (25.109) 



3C 



is an event of probability one. 



Proof. Since (X(t)) is measurable, the mapping 



X 2 (co,t ) 
l + t 2 



(u>,t)~ - , V (25.110) 



is nonnegative and measurable. By Fubini's Theorem it follows that if we define 



10 X 2 (io,t) 
l + t 2 



W(u>)= J - , / At, ueCl, (25.111) 



3C 



/>CO 

J E 


1-M 2 . 


d< 


f°° E\X 2 (t)] 


/*CO 

E[X 2 (0)] / 


1 


1-M 2 


ttE[X 2 (0)] 


•DC. 
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then W is a nonnegative RV taking value in the interval [0, oo]. Consequently, the 
set {to G fi : W(lu) < oo} is measurable. Moreover, by Fubini's Theorem, 

E[W] 



dL 



Thus, W is a RV taking value in the interval [0, oo] and having finite expectation, 
so the event {u> G ft : W(u>) < oo} must be of probability one. □ 



25.14 The PSD Revisited 

Theorem 25.13.2 describes the PSD of the output of a stable filter that is fed a 
WSS SP (X(t)). By integrating this PSD, we obtain the value at the origin of the 
autocovariance function of the filter's output (see (25.30)). Since the latter is the 
power of the filter's output (Corollary 25.9.3), we have: 

Theorem 25.14.1 (Wiener-Khinchin). If a measurable, centered, WSS SP (X(t)) 
of autocovariance function Kxx is passed through a stable filter of impulse response 
h: R — > R, then the average power of the filter's output is given by 

Power of X*h= (K X x, Rhh) • (25.112) 

//, additionally, (X(t)) is of PSD Sxx , then this power is given by 

/oo 
Sxx(/)|M/)| 2 d/. (25.113) 

-oo 

Proof. To prove (25.112), we note that by (25.91) the autocovariance function of 
the filtered process is Kxx* Rhh, which evaluates at the origin to (25.112). The 
result thus follows from Proposition 25.9.2, which shows that the power in the 
filtered process is given by its autocovariance function evaluated at the origin. 

To prove (25.113), we note that Kxx is the IFT of Sxx and that, by (11.35), 
Rhh(/) = \Hf)\ 2 , so the RHS of (25.113) is equal to the RHS of (25.112) by 
Proposition 6.2.4. □ 

We next show that for WSS stochastic processes, the operational PSD (Defini- 
tion 15.3.1) and the PSD (Definition 25.7.2) are equivalent. That is, a WSS SP 
has an operational PSD if, and only if, it has a PSD, and if the two exist, then they 
are equal (outside a set of frequencies of Lebesgue measure zero) . Before stating 
this as a theorem, we present a lemma that will be needed in the proof. It is very 
much in the spirit of Lemma 15.3.2. 
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Lemma 25.14.2. Let g: R — » R be a symmetric continuous function satisfying the 
condition that for every integrable real signal h : R —* R 

/oo 
g{t)R hh {t)dt = 0. (25.114) 

-DC 

Then g is the all-zero function. 

Proof. For every a > consider the function 

hit) = -^=l{\t\ < a/2}, te R 
whose self-similarity function is 

Rhh(t) = (l- — )l{|*| <a}, (61. (25.115) 

Since h is integrable, it follows from (25.114) that 
0= / g{t)R hh {t)dt 



J — OO 

= 2/ ff (t) Rhh(t) dt 

Jo 

= 2/ ff (() (l - -) dt, a>0, (25.116) 

where the second equality follows from the hypothesis that <?(•) is symmetric and 
from the symmetry of Rhh, and where the third equality follows from (25.115). 
Defining 

G(i)= J 9 (£)d£, t>0, (25.117) 

Jo 

and using integration by parts, we obtain from (25.116) that 



= G(O(l-i) 



- / G(0dC, a>0, 
a Jo 



from which we obtain 



oG(0) = / G(£)d£, a > 0. 
/o 



Differentiating with respect to a yields 

G(0) = G(a), a > 0, 
which combines with (25.117) to yield 

g(i)dt = 0, a>0. (25.118) 

Differentiating with respect to a and using the continuity of g (Rudin, 1976, Chap- 
ter 6, Theorem 6.20) yields that g(a) is zero for all a > and hence, by its 
symmetry, for all a € R. □ 
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Theorem 25.14.3 (The PSD and Operational PSD of a WSS SP). Let (X{t)) 
be a measurable, centered, WSS SP of a continuous autocovariance function Kxx ■ 
Let S(-) be a nonnegative, symmetric, integrable function. Then the following two 
conditions are equivalent: 

(a) Kxx is the Lnverse Fourier Transform ofS(-). 

(b) For every integrable h : R — > R, the power in X * h is given by 

/oo 
S(f)\h(f)\ 2 df. (25.119) 

-oo 

Proof. That (a) implies (b) follows from the Wiener-Khinchin Theorem because 
(a) implies that yX(t)J is of PSD S(-). It remains to prove that (b) implies (a). 
To this end we now assume that Condition (b) is satisfied and proceed to prove 
that Kxx must then be equal to the IFT of S(-). By Theorem 25.14.1, the power 
in X 7k- h is given by (25.112). Consequently Condition (b) implies that 



S(f)\h(f)\ 2 df = K xx (r)Rhh(r)dr, (25.120) 

) J — oo 

for every integrable h : R — > R. 

If h is integrable, then the FT of Rhh is the mapping / i— * |/i(/)| 2 (see (11.35)). If, 
in addition, h is a real signal, then Rhh is a symmetric function, and its IFT is thus 
identical to its FT (Proposition 6.2.3 (ii)). Thus, if h is real and integrable, then 
the IFT of Rhh is the mapping / \— » \h(f)\ 2 . (Using the dummy variable / for the 
IFT is unusual but legitimate.) Consequently, by Proposition 6.2.4 (applied with 
the substitution of S(-) for x and of Rhh for g), 

/OO pCX) 

S(f)\h(f)\ 2 df = S(r)R h h(r)dT. (25.121) 

-oo J — OO 

By (25.120) &; (25.121) and by the symmetry of S(-) (which implies that S = S) 
we obtain that 

/oo 
(S(r) - K XX {t)) Rhh(r) dr = 0, h e d . (25.122) 

-oo 

It thus follows from Lemma 15.3.2 that the mapping r i— » S(r) — Kxx{t) is the 
all-zero function, and Condition (a) is established. □ 



25.15 White Gaussian Noise 

The most important continuous-time SP in Digital Communications is white 
Gaussian noise, which is often used to model the additive noise in communi- 
cation systems. In this section we define this process and study its key properties. 
Our definition differs from the one in most textbooks, most notably in that we de- 
fine white Gaussian noise only with respect to some given bandwidth W. We give 
our reasons and comment on the implications in Section 25.15.2 after providing 
our definition and deriving the key results. 
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Saw (/) 



No/2 




-W 




w 



Figure 25.3: The PSD of a SP (-ZV(t)) which is of double-sided spectral den- 
sity Nq/2 with respect to the bandwidth W. 



25.15.1 Definition and Main Properties 

The parameters defining white Gaussian noise are the bandwidth W with respect 
to which the process is white and the double-sided spectral density No/2. 

Definition 25.15.1 (White Gaussian Noise). We say that (N(t)) is white Gaus- 
sian noise of double-sided spectral density No/2 with respect to the band- 
width W if yN(t)) is a measurable, stationary, centered, Gaussian SP that has a 
PSD Saw satisfying 

Saw(/) = ^, fe[-W,W}. (25.123) 

An example of the PSD of white Gaussian noise of double-sided spectral den- 
sity No/2 with respect to the bandwidth W is depicted in Figure 25.3. Note that 
our definition of white Gaussian noise only specifies the PSD for frequencies / sat- 
isfying |/| < W. We leave the value of the PSD at other frequencies unspecified. 
But the PSD should, of course, be a valid PSD, i.e., it must be nonnegative, sym- 
metric, and integrable (Definition 25.7.2). Recall also that by Proposition 25.7.3 
every nonnegative, symmetric, integrable function is the PSD of some measurable 
stationary Gaussian SP. 11 

The following proposition summarizes the key properties of white Gaussian noise. 
The reader is encouraged to recall the definition of an integrable function that is 
bandlimited to WHz (Definition 6.4.9); the definition of the inner product between 
two energy-limited real signals (3.1); the definition of ||s|| 2 as ^/(s,s); and the 
definition of orthonormality of the functions <f>\, . . . , (j) m (Definition 4.6.1). 

Proposition 25.15.2 (Key Properties of White Gaussian Noise). Let (N(t)) be 
white Gaussian noise of double-sided spectral density No/2 with respect to the band- 
width W. 



11 As we have noted in the paragraph preceeding Definition 25.9.1, Proposition 25.7.3 can 
be strengthened to also guarantee measurability. Every nonnegative, symmetric, and integrable 
function is the PSD of some measurable, stationary, and Gaussian SP whose autocovariance 
function is continuous. 
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(i) If s is any integrable function that is bandlimited to W Hz, then 

N{t)8{t)dt~Af(o, ^||s|| 2 2 

(ii) If S\, .. . ,s m are integrable functions that are bandlimited to W Hz, then the 
m random variables 

/DC 
N(t)s m (t)dt 
-OO 

are jointly Gaussian centered random variables of covariance matrix 

^(si,si) (si,s 2 ) ••• (si,s m )\ 
(s 2 ,Si) (s 2 ,s 2 ) ••• (s 2 ,s TO ) 



No 
2 



\(s m ,si) (s TO ,s 2 ) 



\Smi ^m) / 



(Hi) If <f>i, . . . , 4> m are integrable functions that are bandlimited to W Hz and are 
orthonormal, then the random variables 



N(t)Mt)dt, 



N{t)4> m {t)dt 



IID Af(0,N /2) 



(iv) If s is any integrable function that is bandlimited to W Hz, and if Knn is the 
autocovariance function of (N(t)), then 



u N ° 



(25.124) 



(v) If s is an integrable function that is bandlimited to W Hz, then for every 
epoch igl 

N 



Cov 



N(a) s(a) da, N(t) 



*(*)■ 



(25.125) 



Proof. Parts (i) and (iii) are special cases of Part (ii), so it suffices to prove 
Parts (ii), (iv), and (v). We begin with Part (ii). We first note that since {s^} 
are assumed to be integrable and bandlimited to W Hz, and since Note 6.4.12 
guarantees that every bandlimited integrable signal is also of finite energy, it fol- 
lows that the functions {sj} are energy-limited and the inner products {sj,Sk) are 
well-defined. By (25.89) 



Cov 



/oo /*oo 

N(t)Sj(t)dt, / N(t)s k (t)dt 
-OO J — OO 



— OO 

W 



w 



S NN (f)s 3 (f)s* k (f)df 
S NN (f)s 3 (f)sl(f)df 



N, 



No 



w 



w 



%(/K(/)d/ 



(sj,Sfe), j,k e {!,..., m}, 



25.15 White Gaussian Noise 557 

where the second equality follows because Sj and S& are bandlimited to WHz; the 
third from (25.123); and the final equality from Parseval's Theorem. 

To prove Part (iv), we start with the definition of the convolution and compute 



(Kjvjv*s)(t) = / s(T)K NN {t-T)dT 

s(t) r S NN (f)e' 2nf{t - T) dfdT 

J — OC 

S N N(f)S(f)^ ft df 

SMv(/)s(/)e i2 ^ 4 d/ 



— oo 
W 



W 

M r w 

^ s(f)e^df 

1 J-w 



= -^*(t), t€R, 

where the second equality follows from the definition of the PSD of (N(t)) (Defini- 
tion 25.7.2); the third by Proposition 6.2.5; the fourth because s is, by assumption, 
bandlimited to W Hz (Proposition 6.4.10 cf. (c)); the fifth from our assumption 
that (N(t)) is white with respect to the bandwidth W (25.123); and the final 
equality from Proposition 6.4.10 (cf. (b)). 

Part (v) now follows from (25.85) and Part (iv). Alternatively, it can be proved 
using (25.88) and (25.123) as follows: 



Cov 



N(a) s(a) da, N(t) 



— OO 
W 



SMv(/)*(/)e i2x/t d/ 
S NN (f)s(f)^ ft df 



w 
m r w 

= ^r/ Kf)^ n df 

1 J-w 
No 

= ^s(t), tel, 

where the first equality follows from (25.88); the second because s is bandlimited 
to W Hz (Proposition 6.4.10 cf. (c)); the third from (25.123); and the last from 
Proposition 6.4.10 (cf. (b)). □ 

25.15.2 Other Definitions 

As we noted earlier, our definition of white Gaussian noise is different from the one 
given in most textbooks on Digital Communications. The key difference is that we 
define whiteness with respect to a certain bandwidth W, whereas most textbooks 
do not add this qualifier. Thus, while we require that the PSD Sjvat(/) be equal 
to Nq/2 only for frequencies / satisfying |/| < W (leaving Sjvjv(/) unspecified at 
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other frequencies), other textbooks require that Sjviv(/) be equal to No/2 for all 
frequencies / € R. With our definition of white noise we can only prove that 
(25.124) holds for integrable signals that are bandlimited to W Hz, whereas with 
the other textbooks' definition one could presumably derive this relationship for 
all integrable functions. 

We prefer our definition because there does not exist a Gaussian SP (N(t)) whose 
PSD is equal to No/2 at all frequencies. Indeed, the function of frequency that is 
equal to N /2 at all frequencies is not integrable and therefore does not qualify 
as a PSD (Definition 25.7.2). Were such a PSD to exist, we would obtain from 
(25.30) that such a process would have infinite variance and thus be neither WSS 
(Definition 25.4.2) nor Gaussian (Note 25.3.2). 

Requiring that (25.124) hold for all integrable (continuous) signals would require 
that K^vat be given by the product of N /2 and Dirac's delta, which opens a whole 
can of worms. Nevertheless, the reader should be aware that in some books white 
noise is defined as a centered, stationary Gaussian noise whose autocovariance 
function is given by Dirac's Delta scaled by No/2 or, equivalently, whose PSD is 
equal to No/2 at all frequencies. 

25.15.3 White Noise in Passband 

Definition 25.15.3 (White Gaussian Noise in Passband). We say that (-/V(i)) is 
white Gaussian noise of double- sided power spectral density N /2 with 
respect to the bandwidth W around the carrier frequency f c if (.ZV(i)) is a 
centered, measurable, stationary, Gaussian process that has a PSD Snn satisfying 

No W 

S NN (f) = ^, ||/|-/ c |<- (25.126) 

and if f c > W/2. 

Note 25.15.4. For white Gaussian noise with respect to the bandwidth W around 
the carrier frequency f c , all the claims of Proposition 25.15.2 hold provided that 
we replace the requirement that the functions s, {sj}, and {4>j} be integrable func- 
tions that are bandlimited to W Hz with the requirement that they be integrable 
functions that are bandlimited to W Hz around the carrier frequency f c . 



25.16 Exercises 

Exercise 25.1 (Constructing a SP from a RV). Let W be a standard Gaussian RV. Define 
the continuous-time SP (X(£)) by 



X(t) = e~ M W, t£ 

(i) Is (X(t)) a stationary SP? 
(ii) Is (X(t)) a Gaussian SP? 
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Exercise 25.2 (Delaying and Adding). Let (X(i)) be a stationary Gaussian SP of mean p, x 
and autocovariance function Kxx ■ Define 

Y(t) = x(t)+x(t-t D ), teR, 

where to 6 B is deterministic. 

(i) Is (Y(t)) a Gaussian SP? 

(ii) Compute the mean and the autocovariance function of (Y(t)j. 
(iii) Is (Y(t)) stationary? 

Exercise 25.3 (Random Variables and Stochastic Processes). Let the random variables X 
and Y be IID Af(0,a 2 ), and let 

Z(t) = Xcos(2tt£) + rsin(27ri), t S R. 

(i) Is Z(0.2) Gaussian? 
(ii) Is (Z(t)) a Gaussian SP? 
(iii) Is it stationary? 

Exercise 25.4 (Stochastic Processes through Nonlinearities). 

(i) Let (X(t)) be a stationary SP and let 

Y(t) = g(X(t)), t€R, 

where g : R — » R is some (Borel measurable) deterministic function. Show that the 
SP (y(t)) is stationary. Under what conditions is (Y(t)) WSS? 

(ii) Let (X(t)j be a centered stationary Gaussian SP of autocovariance function Kxx- 
Let Y(t) — sgn(X(i)), where sgn(£) is equal to +1 whenever £ > and is equal 
to —1 otherwise. Is (Y(t)j centered? Is it WSS? If so, what is its autocovariance 
function? 

Hint: For Part (ii) recall Exercise 23.18. 

Exercise 25.5 (WSS Stochastic Processes). Let A and B be IID random variables taking 
on the values ±1 equiprobably. Define the SP (Z(t)) as 

Z(t) = ^cos(27ri) + Bsin(27ri), (£l. 

(i) Is the SP (Z(t)) WSS? 
(ii) Define the SP (W(t)) by W{t) = Z 2 {t). Is (W(t)) WSS? 

Exercise 25.6 (Valid Autocovariance Functions). Let K X x and K Y y be the autocovariance 
functions of some WSS stochastic processes (X(t)) and (Y(t)). 

(i) Show that Kxx + Kyy is an autocovariance function of some WSS SP. 
(ii) Repeat for r i— > Kxx(r) Kyy(r). 

Exercise 25.7 (Time Reversal). Let Kxx be the autocovariance function of some WSS 
SP (X(t), tel). Is the time-reversed SP (w,t) ^ X(u>,-t) WSS? If so, express its 
autocovariance function in terms of Kxx ■ 
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Exercise 25.8 (Classifying Stochastic Processes). Let (X(t)) and (Y(t)) be independent 
centered stationary Gaussian stochastic processes of unit variance and autocovariance 
functions Kxx and Kyy. Define the stochastic processes [S(t)f, (T(t)), (U(t)), (V(t)), 

and (W(t)) at every t e E as: 

S{t) = X{t) + Y(t + Tl ), T(t)=X{t)Y{t + r 2 ), 
U(t) =X(t) + X(t + r 3 ), V(t)=-X(t)X(t + Ti), 
W(t) =X(t) + X(-t), 

where n, T2, T3, T4 £ R are deterministic. Which of these stochastic processes is Gaussian? 
Which is WSS? Which is stationary? 

Exercise 25.9 (A Linear Functional of a Gaussian SP). Let (X(t), t e R) be a measurable 
stationary Gaussian SP of mean 2 and of autocovariance function Kxx ■ t i— > exp (— |r|). 
Compute 

]•)■ / X(t)di> 2 . 



Exercise 25.10 (Two Filters). Let (X(i)) be a centered stationary Gaussian SP of auto- 
covariance function Kxx and PSD Sxx- Define 

(r(i)) = (x(i)) * h,, (z(t)) = (x(t)) * h„ 

where h y ,h z d Li . Thus, (Y(t)) is the result of passing (-X'(i)J through a stable filter of 
impulse response hj, and similarly [Z(t)j . 

(i) What is the joint distribution of Y(ti) and Zfa) for given epochs ti,t2 G R? 

(ii) Give a necessary and sufficient condition on hj,, h z , and Sxx for Y(17) to be 
independent of Z(17). 

(iii) Give a necessary and sufficient condition on h y , h z , and Sxx for (Z(t)) to be 
independent of (Y (t)) . 

Exercise 25.11 (Linear Functionals of White Gaussian Noise). Find the distribution of 

/*T S <>oo 

/ N(t) dt and of / e'^it) dt 
Jo Jo 

when (iV(t), t £ R) is white Gaussian noise of double-sided PSD No/2 with respect to 
the bandwidth of interest. (Ignore the fact that the mappings t \-* I{0 < t < T s } and 
t 1— » e _t I{i > 0} are not bandlimited.) 

Exercise 25.12 (Approximately White SP). Let (X(t), t € R) be a measurable, centered, 
stationary, Gaussian SP of autocovariance function 

Kxx(r)^e- B H, r £ E, 

where N , B > are given constants. Throughout this problem N is fixed. 

(i) Plot Kxx for several values of B. What does Kxx look like when B»l? Show 
that Kxx (r) > for all r e E; that 

Kxx(r)dr = -^-; 
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and that for every 6 > 0, 

im / K xx (t) dr 



B — >oo / _ c 2 

(In this sense, Kxx approximates Dirac's Delta scaled by No/2 when B is large.) 

(ii) Compute E[X(i) 2 ] . Plot this as a function of B, with No held fixed. What happens 
when B > 1? 

(iii) Compute the PSD Sxx- Plot it for several values of B. What does it look like when 
B > I? 

(iv) For the orthonormal signals defined for every t € R by 

( (l if 0< t< |, 

[l if 0<£< 1, , , s , ~ ~ 2 

Mt) = { n + , "•" 0a*)=^-l if|<t<l, 

I otherwise, 
v I otherwise 



compute E[(X, <f>i) (X, fa}] ■ What happens to this expression when B>1 



? 



Chapter 26 

Detection in White Gaussian Noise 

26.1 Introduction 

In this chapter we finally address the detection problem in continuous time. The 
setup is described in Section 26.2. The key result of this chapter is that — even 
though in this setup the observation consists of a stochastic process (i.e., a contin- 
uum of random variables) — the problem can be reduced without loss of optimality 
to a finite-dimensional problem where the observation consists of a random vec- 
tor. Before stating this result precisely in Section 26.4, we shall take a detour in 
Section 26.3 to discuss the definition of sufficient statistics when the observation 
consists of a continuous-time SP. The proof of the main result is delayed until Sec- 
tion 26.8. In Section 26.5 we analyze the conditional law of the sufficient statistic 
vector under each of the hypotheses. This analysis enables us in Section 26.6 to 
derive an optimal guessing rule and in Section 26.7 to analyze its performance. Sec- 
tion 26.9 addresses the front-end filter, which is a critical element of any practical 
implementation of the decision rule. Extensions to passband detection are then de- 
scribed in Section 26.10, followed by some examples in Section 26.11. Section 26.12 
treats the problem of detection in "colored" noise, and the chapter concludes with 
a discussion of the detection problem for mean signals that are not bandlimited. 

26.2 Setup 

A discrete random variable M ( "message" ) takes value in the set M. = {1, . . . , M}, 
where M > 2, according to the a priori probabilities 

7r m = Pr[M = m], meM, (26.1) 

where wi, . . . , 7r;vi are positive 1 

7r m > 0, meM (26.2) 



1 There is no loss in generality in addressing the detection problem only for strictly positive 
priors. Hypotheses that have a zero prior can be ignored at the receiver without loss in optimality. 
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and sum to one 

J2 Km = 1. (26.3) 

meM 

The observation consists of the continuous-time SP (Y(t), t€ 1), which, condi- 
tional on M = m, can be expressed as 

Y(t)=a m (t) + N(t), tel, (26.4) 

where the "mean signals" Si, . . . , Sjvi are real, deterministic, integrable signals that 
are bandlimited to W Hz (Definition 6.4.9), and where the "noise" yN(t)J is inde- 
pendent of M and is white Gaussian noise of double-sided spectral density No/2 
with respect to the bandwidth W (Definition 25.15.1). Based on the observa- 
tion (V(t)J we wish to guess M with the smallest possible probability of error. 2 

26.3 Sufficient Statistics when Observing a SP 

The definition of sufficient statistics for the infinite-dimensional hypothesis testing 
problem where the observation consists of a SP is conceptually very similar to the 
definition in the finite-dimensional case where the observation consists of a random 
vector (Definition 22.2.1). But some new technical difficulties do arise. Foremost is 
that we cannot speak of the probability density function (in the usual sense) of the 
observation given each of the hypotheses. 3 Consequently, we need a new definition 
that does not involve such densities. 

26.3.1 Definition of Sufficient Statistics 

Loosely speaking, a sufficient statistic for guessing a RV M taking value in the 
finite set M based on an observation consisting of a SP (Y(t)) is a random vector 
T = (T*- 1 ), . . . , T"( ') T that satisfies two conditions. The first is that it can be 
computed from the observed SP, and the second is that — once we are given T — any 
finite number of samples r\ G N of the observations Y(ti), . . . , Y(t v ) are irrelevant 
for guessing M. Thus, once T has been revealed to us, our optimal guess for M 
will not be improved if we are additionally given the values of (Y(t)) at any finite 
number of (deterministic) epochs. 

Recall the definition of the cr-algebra generated by the SP (Y(t)) (Definition 25.2.2) 
and the definition of irrelevant data (Definition 22.5.1). 

Definition 26.3.1 (Sufficient Statistic: Observable SP). We say that the random 
vector T forms a sufficient statistic for guessing the RV M taking value in the 
finite set M based on the observed SP (Y(t)) if the following two conditions hold: 



2 In mathematical terms we are looking for a mapping from the set of all sample-paths of 
(Y(4)) to M. that is measurable with respect to the cr-algebra generated by (Y(t)) and that 
minimizes the probability of error among all such functions. 

3 One could, instead, speak of the Radon-Nikodyrn derivative with respect to a reference 
measure, but we prefer not to pursue this approach. 
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1) T is measurable with respect to the a-algebra generated by (Y(t)). 

2) For every r\ G N and every choice of the epochs £i, . . . ,t v G M., the n-tuple 
(Y(ti), . . . , Y(t v )j is irrelevant for guessing M based on T. 

Condition 2) is equivalent to 

M -o-T-o-(Y{t 1 ),...,Y(t ri )) (26.5) 

forming a Markov chain for any prior on M. 

As we shall see in Section 26.4, such a sufficient statistic can always be found for 
the setup described in Section 26.2. 

26.3.2 Consequences of Sufficiency 

It would have been nice if, in analogy with Proposition 22.2.2, we could have said 
that if T forms a sufficient statistic for guessing M based on the observed SP (Y(t)), 
then the best performance in guessing M based on (Y(t)) can be achieved by a 
decision rule that bases its decision on T. This statement is almost correct, but it 
requires a qualification. 

A pathological example that demonstrates the need for a qualification is the fol- 
lowing. Suppose that M takes on the values 1 and 2 equiprobably and that R is a 
RV that is independent of M and that has a density. For example, R could be a 
mean-one exponential. Suppose further that, conditional on M = 1, the observed 
SP (Y(t)) is deterministically zero, and that, conditional on M = 2, the observed 
SP is zero at all times (6 R except at time R when it takes on the value 1. In 
this case the conditional law of (Y(ti), . . . , Yit^j) does not depend on whether the 
conditioning is on M = 1 or on M = 2. Thus, if we define the RV T to equal 17 de- 
terministically, then T forms a sufficient statistic for guessing M based on (Y(t)j. 
The smallest probability of a guessing error based on T is 1/2. 4 Nevertheless, a 
detector that guesses "M = 1" if the observed trajectory is the all-zero function 
and "M = 2" if the observed trajectory is discontinuous is correct with probability 
one. 

It is interesting to note that the latter guessing rule is not measurable with respect 
to the fi-algebra generated by (Y(t)j. As the next theorem demonstrates, the qual- 
ifier that we need to add is that we only consider guessing rules that are measurable 
with respect to the <r-algebra generated by (Y(t)). Barring this qualifier, if T is 
sufficient, then there is no loss in optimality in basing our guess on T only. 

Theorem 26.3.2. Consider the multi-hypothesis testing problem of guessing a RV M 
taking value in the set M = {1, . . . , M} based on an observation consisting of a 
SP {Y(t), t 6l). Let T be a random vector that forms a sufficient statistic for 
guessing M based on \Y(t)). Then no decision rule that is measurable with respect 
to the a-algebra generated by (Y(t)) can have a lower probability of error than an 
optimal rule for guessing M based on T. 



4 This is also the smallest probability of error in guessing M based on (Y(t\), . . . , Y(t v )j, 
irrespective of the (finite) value of the positive integer r\ and of the (deterministic) choice of the 
epochs ti, . . . ,t v . 



26.3 Sufficient Statistics when Observing a SP 565 

Proof. Let (/>(•) be any decision rule that is measurable with respect to the o- 
algebra generated by (Y(t)), i.e., a decision rule whose disjoint decision sets 

P ro 4 ^(m), meM 

are all measurable with respect to this cr-algebra. The conditional probability that 
the rule </>(•) guesses correctly is 

Pr (</>(•) is correct \M = m) = Pr[(Y(t)) e V m \ M = m] , m e M. (26.6) 

We shall show that </>(•) can be approximated by a decision rule </>(■) that bases its 
decision on a finite number of samples Y(t\), . . . , Y(t v ), where t) G N and where 
t\, . . . ,trj € R are deterministic epochs. The approximation is in the sense that, 
conditional on each m € A4, the probability of success of </>(•) is within e of that 
of (/)(•). We shall then show that the best decision rule based on T is at least as 
good as (/>(•) and is thus also within e of </>(•). Since these steps will be performed 
for an arbitrary e > 0, and since the performance of the best decoder based on T 
does not depend on e, this will demonstrate that </>(•) is no better than the best 
decision rule based on T. And since (/>(■) here is an arbitrary measurable decision 
rule, it will follow that no measurable decision rule can outperform an optimal rule 
based on T and the theorem will be proved. 

To follow this outline we first need some basic set-theoretic notation. Given two 
sets A and B we denote by A \ B the set consisting of those elements of A that 
are not in B. We denote by A A B the symmetric set difference between A and B 
consisting of those elements that are in one of the sets but not in the other. Thus, 

AAB= {A\B)U{B\A) 5 

A standard result from Measure Theory (Halmos, 1950, Exercise (8), Section 14) 
guarantees that for every e > there exist epochs t±, . . . , t n 6M and sets T>\, . . . T>m 
(not necessarily disjoint) that are all measurable with respect to the cr-algebra 
generated by Y(ti), . . . , Y{t, q ) and such that 

Pr[(Y(t)) eV m , AV m , \M = m] < ^-, m,m' € M. (26.7) 

Define now the disjoint sets T>±, . . . ,2?m inductively by defining T>\ = T>\ and 

V m = V m \ (J V m >, me{2,...,M}. (26.8) 

m' <m 

By construction, these sets are disjoint. And because T>\, . . . ,2?m are measurable 
with respect to the cr-algebra generated by Y(ti), . . . , Y(t n ), so are Pi, . . . ,Pm- 



5 As an aside we mention that the indicator functions of the sets A, B and A. AS are related 
via the relation 

l{x&AAB} = l{x £A}® l{x £ B}, 
where denotes exclusive-or, i.e., mod-2 addition (000 = 0, 0© 1 = 1, 100 = 1, and 1© 1 = 0). 
This relationship simplifies the proof of some of the key properties of the symmetric set difference, 
especially when combined with the analogous relation for intersection 

l{x&Ac\B} = l{x e A} l{x e B}, 

where on the RHS of the above we use mod-2 multiplication. 
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We next consider a decoder </>(•) that guesses "M = m" whenever the sample-path 
of (Y(t)) is in the set T> m . (If the sample-path of (Y(t)) does not fall in any of the 
sets {T>m}, then the decoder produces an error flag.) This decoder bases its guess 
only on Y(ti), . . . , Y(t n ) and yet, as we shall next show, succeeds with probability 
that is at least within e of the probability of success of 4>(-), i.e., 

Pr(</>(-) is correct M = m) > Pr(</>(-) is correct M = m) — e, m a M. (26.9) 

This will imply, in particular, that when averaged over M 

Vr[4>{-) is correct) > Pr(</>(-) is correct) — e, (26.10) 

irrespective of the prior on M . But by (26.5) an optimal decision rule based on T 
is at least as good as <p(-) and is thus also within e of <fi(-)- 6 Since e is arbitrary, it 
follows that an optimal decision rule based on T is at least as good as </>(•) , thus 
proving the theorem. 

To complete the proof it thus remains to prove (26.9). To that end we note that, 
since for any sets A and B we have A 3 B n A, it follows that T> m D T> m n T> m and 
hence, by (26.8), 

V m D (V m DV m )\ |J V m , 

m' <m 

D (V m DV m )\ |J (V m , U (V m , \ P m ,)) (26.11) 

m' <m 

= (v m nv m )\( |J v m , u |J (v m , \ v m 



(v m n v m ) \ |J v m , ) \ ( U (v m > \ v m 

m' <rn m' <m 

(V m nf> m )\( |J (V m ,\V m ,)\ (26.12) 

m' Km 

(v m \ (V m \ P TO )) \ ( |J (p m > \ v m .) J 

m' <m 

V m \ ((V m \V m ) U |J (P m '\P ro /)), (26.13) 



m' <_m 



where (26.11) follows because T> m i U CD m ' \ T> m i) = T> m i U T> m i D T> m i, and be- 
cause, by construction, 2? m < contains D m > (see (26.8)); where the equality (26.12) 
follows because the sets {2? m } are disjoint; and where the other equalities follow 
by standard set-theoretic identities. It follows from (26.13) that 

V m C V m U (V m \V m ) U |J (V m ,\V m ,), (26.14) 



6 An optimal decision rule based on T is the Maximum A Posteriori rule that computes 
the conditional distribution of M given T. But the Markov condition (26.5) implies that the 
conditional law of M given T is the same as the conditional law given T & (Y(ti), . . . ,Y(t v )), 
so an optimal decision rule based on T is as good as an optimal decision rule given T and 
(Y(ti), . . . , Y(t v )) and is therefore at least as good as </>(•), which is based on (Y(ti), . . . , y(tr;)) 
alone. 
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because for arbitrary sets A, B, C, the relation A ^ B\C implies that B C A U C. 
From (26.14) and the Union-of-Events Bound (Theorem 21.5.1) we obtain 

Pr(</>(-) is correct M = m) 

= Pr[(y(t)) eV m \M = m] 

< Pr[(y(t)) eV m \M = m}+ Pr [(y(t)) € (P m \ P m ) | M = m] 
+ ]T Pr [(^(*)) e P m < \ U m . | M = m] 

m'<rn 

< Pr [(y(t)) eV m \M = m}+ ]T Pr[(y(t)) e V m , AV m , \ M = m] 

m' <m 



< Pr ((/»(•) is correct | M 

< Pr(0(-) is correct M = m) + e, m € M., 



m) + in — 
' M 



where the first inequality follows from (26.14) and the Union-of-Events bound; the 
second inequality follows because for any two sets A and B we have A \ B C A A B 
and also B \ A C „4 AB; the third inequality from (26.7); and the final inequality 
because m € {1, . . . , M}. This concludes the proof by establishing (26.9). □ 



26.4 Main Result 

The main result of this chapter is Theorem 26.4.1, which provides a sufficient 
statistic for the setup of Section 26.2. A more general version (Theorem 27.3.1) 
will be proved in Chapter 27. Nevertheless, we have chosen to provide a separate 
proof of Theorem 26.4.1 in Section 26.8 because the proof of this case is simpler. 

Theorem 26.4.1 (Inner Products with the Mean Signals Suffice). In the setup 
of Section 26.2, the random, vector 

Y{t) Sl {t)dt,..., f Y(t)s M (t)dt) (26.15) 

forms a sufficient statistic for guessing M based on (Y(t)). 

Proof. See Section 26.8. □ 

Because the RV J Y(t) s m (t) dt can be viewed as a mapping that maps each ui G fi 
to the inner product between its trajectory t \— > Y(u),t) and the signal t t— > s m (t), 
we denote this random variable by (Y,s m ). With this notation, the main result 
is that the M inner products 

(Y,si> (Y,b m ) (26.16) 

form a sufficient statistic for guessing M based on [Y{t)j. 



7 Here, as throughout, (Q.,T, P) denotes the probability space over which all the random 
variables and stochastic processes in the setup are defined. 
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Figure 26.1: Computing the inner product between the observed SP and each of 
the mean signals and then basing one's decision on these inner products. 



This theorem is extremely useful because in combination with Theorem 26.3.2 it 
demonstrates that, without loss of optimality, we can limit ourselves to guessing 
rules that use the observation to compute the M. inner products (26.16) and that 
then base their decision on these inner products. Figure 26.1 illustrates such de- 
cision rules. The theorem thus helps us to convert the guessing problem from one 
with a continuous-time observation (Y(t)) to a problem of the kind we addressed 
in Section 21.3, where the observable is a finite-dimensional random vector (the 
inner products vector, which takes value in in K M ). 

We can generalize Theorem 26.4.1 using the linearity of the stochastic integral 
(Lemma 25.10.3). This generalization allows us to further reduce the dimension of 
the sufficient statistic vector from the number of messages M to the dimension d 
of the linear subspace span(si, . . . ,Sjvi) spanned by the mean signals s 1; . . . , sjvi: 



d = Dim(span(si, . . . , s^)) . 



(26.17) 



Corollary 26.4.2. Let si,...,s n be integrable signals that are bandlimited to W 
Hz. 8 If every mean signal can be written as a linear combination of (§i, . . . , S n ), 
then the random n-vector 

Y(t)h(t)dt,..., f Y(t)S n (t)dtj (26.18) 

forms a sufficient statistic for guessing M based on \Y(t)). 

Proof of the corollary. By the corollary's hypothesis, every mean signal s m can be 
written as a linear combination of the signals {ij}™ =1 . Thus, to each m € M. there 



correspond n coefficients (not necessarily unique) a 



(i) 



such that 



/ j a m s j 



(26.19) 



3 The result also holds if the signals are not bandlimited, but we prefer to assume that they 
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Consequently, by the linearity of integration (Lemma 25.10.3), we can compute the 
integrals appearing in (26.15) from the random n- vector (26.18) using the relation 



Y(t)a m (t)dt 



J2 a 

3 = 1 



U) 



Y(t)sj(t)dt, meM. 



From the vector in (26.18) we can thus compute the vector in (26.15), and since 
the latter forms a sufficient statistic (Theorem 26.4.1) it follows that the former 
must also form a sufficient statistic (Proposition 22.4.2). 9 □ 

We note that Corollary 26.4.2 does, indeed, generalize the theorem because, by 
choosing n = M with s m = s m for all m £ M., we recover the theorem from 
the corollary. More interesting is the case where (§i,...,s n ) forms a basis for 
span(si, . . . , sm). In this case the corollary provides a sufficient statistic consisting 
of a random d- vector, where d is the dimension of span(si, . . . ,Sm). This reduces 
the number of inner products needed to implement the receiver from M to d. As 
we shall see, it is particularly convenient to choose (s 1; ...,§„) as an orthonormal 
basis for span(si, . . . ,sm). In this case we shall prefer to refer to {§j} as {<f>e}(—i, 
where, as before, d is the dimension of span(si, . . . , Sm). 



26.5 Analyzing the Sufficient Statistic 



26.5.1 The Conditional Law of the Sufficient Statistic 



Having reduced the guessing problem from one where the observation is a SP to 
one where it is a random vector, we can proceed to derive an optimal decision rule 
based on this vector. To derive such a rule we need the conditional distribution 
of this vector conditional on each of the hypotheses. Fortunately, this is easy 
for the problem at hand, because the Gaussianity of the noise (N(t)) implies 
that, conditional on each of the hypotheses, the vectors in (26.15) and (26.18) 
are Gaussian (Theorem 25.12.1). Their conditional distributions are thus fully 
specified by their mean vectors and covariance matrices. 

The calculation of the mean vectors is straightforward. Indeed, by linearity and 
by Proposition 25.10.1, 



Y(t)Sj(t)dt 



M = m 



(s m (t) + N(t)) Sj (t)dt 



\S m , Sj) -\- t 



N(t)Sj(t)dt 



= {s m ,Sj}, j,m€ M. 

Thus, for every m G M, the conditional mean of the vector in (26.15), conditional 
on M = m, is the vector 



((s m ,si),. . . , (s m ,s M )) ■ 



(26.20) 



9 For the pedantic reader one should add that, by Proposition 25.10.1, the vector in (26.18) 
is measurable with respect to the cr-algebra generated by (Y(t)). 
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The calculation of the conditional covariance matrices requires a simple application 
of Proposition 25.15.2. It yields that the covariance matrix of the vector in (26.15), 
conditional on M = m, is given by the M x M matrix 



No 
2 



^(si,si) (si,s 2 ) ••• (si,s M )^ 
(s 2 ,si) (s 2 ,s 2 ) ••• (s 2 ,s M > 



(26.21) 



\(sm,si) (s m ,s 2 ) ••• (s w ,s M )/ 



Note that the conditional covariance matrix does not depend on the hypothesis m 
on which we are conditioning, because this hypothesis only influences the mean 
of (Y(t)). 

More generally, for the sufficient statistic vector in (26.18) we obtain that for every 
m £ M the conditional distribution of this vector, conditional on M = m, is 
Gaussian with the n-dimensional mean vector 

((s m , Sl ),...,(s m ,s„)) T (26.22) 

and the n X n covariance matrix 

/(§i,§i) (si,s 2 ) ••• (si,s„)\ 



No 
2 



(s 2 ,Si) (§ 2 ,S 2 ) ■■• (§2,S r , 



\(Sn,Sl) (S„,S 2 ) ••• (§„,§„)/ 



(26.23) 



(The assumption that the signals {s^} are bandlimited to W Hz is not needed in 
Corollary 26.4.2, but it is needed for the above conditional law to hold.) 

26.5.2 It Is all in the Geometry! 

It is interesting to note that the conditional mean vector in (26.20) and the condi- 
tional covariance matrix in (26.21) are fully determined by No/2 and by the inner 
products 

{(Sm',Sm'')W,ra"eMi (26.24) 

the PSD of the noise (N(t)) outside the band / € [— W, W] is immaterial. Similarly, 
except in determining the pairwise inner products, the exact waveforms of the mean 
signals are immaterial. Since the conditional distribution of the sufficient statistic 
vector (26.15) is Gaussian, and since the distribution of a Gaussian vector is fully 
determined by its mean vector and its covariance matrices (Theorem 23.6.7), we 
can conclude: 

Note 26.5.1. The conditional distribution of the sufficient statistic vector (26.15) 
given each of the hypotheses is determined by No and by the inner products in 
(26.24). The PSD of the noise at frequencies outside the band [— W, W] is imma- 
terial. 

Note, however, that the calculation of the sufficient statistic from the observa- 
tion (Y(t)) requires more than just knowledge of the inner products in (26.24); the 
calculation of the vector (26.15) requires knowledge of the waveforms si, . . . , Sjvl- 
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Since an optimal decision rule for guessing M based on (Y(t)) can be based on the 
sufficient statistic (Theorem 26.3.2), and since the conditional distribution of the 
sufficient statistic given each of the hypotheses depends only on N and the inner 
products (26.24), it follows that: 

Proposition 26.5.2. For the setup of Section 26.2, the minimal probability of error 
that can be achieved in guessing M based on (Y(t)) is determined by No, by the 
inner products (26.24), and by the prior {^ m }meM- 

26.5.3 Orthonormal Bases 

The conditional distribution of the sufficient statistic given each of the hypotheses 
is easier to manipulate if we choose the functions {sj} in (26.18) to form an or- 
thonormal basis for the linear subspace spanned by the mean signals. In this case 
we denote the basis functions by <p\, . . . , (pd so 

span(0i,. ..,<p d )= span(si,.. .,s M ), (26.25a) 

{4>t,,<t> t „)=\{e =1"}, £',£" &{l,...,d}, (26.25b) 

where d is the dimension of the linear subspace spanned by the mean signals (26.17). 
Such functions (pi, . . . ,<f>d can be found, for example, using the Gram-Schmidt 
procedure (Section 4.6.6). 10 We denote the sufficient statistic vector (26.18) by 

T=(T( 1 ),...,TW) T : 

/OO 
Y{t)4> t {t)dt 
-OO 

= (Y,4>e), £=l,...,d. (26.26) 

Figure 26.2 depicts a block diagram of a circuit that computes the inner products 
of the received waveform with each of the basis signals. 

By (26.22) and (26.23) we obtain that for every m € M. the conditional distribution 
of T given that M = m is Gaussian with mean 

E[T | M = m] = ((s m ,0 1 ),...,(s m ,0 d )) T (26.27) 

and covariance matrix (No/2) 1^, where \d denotes the d x d identity matrix. The 
components of T are thus conditionally independent and of equal variance No/2 
(but not of equal mean). Consequently, we can express the conditional density 
of T, conditional on M = in, at every point t = (t^ 1 ', . . . , fi ') T € M. d using this 
conditional independence and the explicit form of the univariate Gaussian density 
(19.6) as 

f m TT l ( ( iW -<s m ,<M) 2 ' 



(7rNo) d / 2 




™P[-TT>]{ tW -( s m,<j>i)y), tGR d . (26.28) 



10 Since the mean signals are bandlimited, the only zero-energy element of span(si, . . . , sy) 
is the all-zero signal (Note 6.4.2). Consequently, span(si, . . . , sy ) has an orthonormal basis 
(Proposition 4.6.10), which can be found using the Gram-Schmidt procedure (Section 4.6.6). 
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Figure 26.2: Computing the inner products T'" = (Y, <^) for £ = 1, . . . , d from 
the received waveform. 



Note that with proper translation (Table 26.1) the conditional distribution of T is 
very similar to the one we addressed in Section 21.6; see (21.50). In fact, it is a 
special case of the distribution studied in Section 21.6: Y there corresponds to T 
here; ] there corresponds to d here; a 1 there corresponds to No/2 here; and the 
mean vector s m associated with Message m there corresponds to the vector 



((S m ,0i),...,(s m ,0 d )) 



(26.29) 



here. Consequently, we can use the results from Section 21.6 and, more specifically, 
Proposition 21.6.1, to derive an optimal decision rule for guessing M based on T. 
We adopt this approach when we next derive an optimal decision rule for our setup. 





In Section 21.6 


Here 


number of components of 
observed vector 


I 


d 


variance of noise added to 
each component 


a 2 


No/2 


number of hypotheses 


M 


M 


conditional mean of observa- 
tion given M = m 


(s {1) s 0) ) T 


((s m ,0i),. .., (s m ,<j> d )) 


sum of squared components 
of mean vector 


zoa 2 


d />oo 

^((s m ,^» 2 =/ s 2 m (t)dt 

e=1 J-oo 



Table 26.1: The setup in Section 21.6 and here. 



26.6 Optimal Guessing Rule 



We are finally ready to derive an optimal guessing rule for our setup. Recall that, by 
Corollary 26.4.2, if ((pi, . . . , <$>d) is an orthonormal basis for the linear space spanned 
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by the mean signals, then the random vector T defined in (26.26) forms a sufficient 
statistic for guessing M based on (Y(t)). Consequently, by Theorem 26.3.2, it is 
optimal to use the observation (Y(t)) to compute the vector T and to then use the 
MAP rule to guess M based on T (Theorem 21.3.3). We shall do just that. We 
first present the resulting rule in terms of the orthonormal basis (4>i, . . . , (pd) and 
then show that the rule does not depend on the specific choice of the orthonormal 
basis. 

In deriving the decision rule we shall repeatedly use the fact that if (<f>i, ■ ■ ■ , (pd) is 
an orthonormal basis for span(s!, . . . ,Sjvl), then, by Proposition 4.6.4, 

d 
s m = ^2(s m ,(f>t)4>e, meM, (26.30) 

i=\ 

and, by Proposition 4.6.9, 

d 
H 8 m|l2 = Z]( s m,<fc) 2 , meM. (26.31) 



26.6.1 The Decision Rule in Terms of (0i, . . . , 0^) 

As we have noted, the conditional density /T|M=m(") i n (26.28) is of the form we 
discussed in Section 21.6 (Table 26.1). By Proposition 21.6.1 we thus obtain: 

Theorem 26.6.1. Let M, Si,...,sm, and (Y(t)) be as in our setup, and let the 
d-tuple (0i, . . . , (pd) be an orthonormal basis for span(si, . . . , Sm). 

(i) The decision rule that guesses uniformly at random from among all the mes- 
sages m e M for which 

, Efcl(( Y ><M - (Sm,0£)) 

In 7T™. 



No 

J, Eti((Y,<M~ (s m >,<t>e)) 2 \ , orQQ , 
= max < hi7r m / — } (26.32) 

m'EM I N J 

minimizes the probability of a guessing error. 

(ii) If M has a uniform distribution, then this rule does not depend on the value 
o/No- It chooses uniformly at random from among all the messages m£M 
for which 



d ( d \ 

^2((Y,<p() - (s fn ,(p e )) 2 = minl^2((Y,<p e ) - (s m ,,0 £ )) 2 \ 



(26.33) 



(Hi) If M has a uniform, distribution and, in addition, the mean signals are of 
equal energy, i.e., 

I|siIU = ||s 2 || 2 = --- = ||sm|| 2 , (26.34) 
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then these decision rules are equivalent to the maximum- correlation rule that 
guesses uniformly from among all the messages rh € M. for which 

d d 

y^(Srh,(f>e)(y,(f)e) = maxV(s m -,^>(Y,^). (26.35) 

1=1 1=1 

Proof. The theorem follows directly from Proposition 21.6.1. For Part (iii) we 
need to note that, by (26.31), Condition (26.34) is equivalent to the condition 

d d d 

^< Sl ,0 £ ) 2 = 5> 2 ,0 £ ) 2 = ••• = ]T(s M ,</> £ ) 2 , (26.36) 

i=\ e=i i=i 

which is the condition needed in Proposition 21.6.1. □ 

Note that, because (<pi, . ■ . , <j>d) is an orthonormal basis for span(si, . . . ,Sm.)j the 
signals s m > and s m " differ, if, and only if, the vectors ((s m <, <pi), . . . , (s m <, 4>d)) T 
and ((s m ", 4>i) i . • • , (s m ", 4>d)) T in R d differ. Consequently, by Proposition 21.6.2: 

Note 26.6.2. If the mean signals si, . . . , sm are distinct, then the probability of a 
tie, i.e., that more than one message rh € M satisfies (26.32), is zero. 

26.6.2 The Decision Rule without Reference to a Basis 

We next derive a representation of our decision rule without reference to a specific 
orthonormal basis. 

Theorem 26.6.3. Consider the problem of guessing M based on [Y(t)j in our 
setup. 

(i) The decision rule that guesses uniformly at random from among all the mes- 
sages rh G M. for which 

o / P OO i /*oo 

ln7T A + — (/ Y(t)Srn(t)dt-- S%(t)dt 



No \J_„ 2 



— CO " J — OO 

OO -1 /*oo 



= max jlnTrw + ^ (/ Y ( t ) s ™'( f ) dt ~ \ J s ™>(t) ^J ] (26.37) 

minimizes the probability of error. 

(ii) If M has a uniform distribution, then this rule does not depend on the value 
o/No- It chooses uniformly at random from among all the messages rh € M 
for which 



Y(t) Srh (t)dt-- 4(i)dt 

oo J — OO 



CO 



max { I Y{t)s m ,{t)dt- - I s 2 m ,{t)dt}. (26.38) 

m'eM J_^ Z 
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(Hi) If M has a uniform distribution and, in addition, the mean signals are of 
equal energy, i.e., 

Il s ill 2 = IMU = ■■• = II s mII 2 , 

then these decision rules are equivalent to the maximum- correlation rule that 
guesses uniformly from among all the messages mgM for which 

Y(t)s frl (t)dt = max \ / Y(t) s m ,(t) dt \ . (26.39) 

m'£M [J_ 00 ) 

Proof. We shall prove Part (i) using Theorem 26.6.1 (i). To this end we begin by 
noting that 

ln7T m / 

INo 

can be expressed by opening the square as 

1 d 2 d 1 d 

e=i ° e=i ° e=i 

Since the term 

d 



irEOf.^ 



N 
u 1=1 

does not depend on the hypothesis, it is optimal to choose a message at random 
from among all the message rh satisfying 

„ d 1 d 

Nn — Nn — 

u £=1 u £-1 



1 2 

max < in w m r , 

m'eM ' Nr 



2 d 1 d 1 

-^(Y,^)(s m ,,0 £ ) - — ^2(s ml ,<f> e ) 2 > . 

1=1 ° 1=1 J 

Part (i) of the theorem now follows from this rule using (26.31) and by noting that 

d >d 

^(Y,0 £ )(s m ,0 £ ) = (Y,^2(s m ,<t>e)<f>e 



£=1 * e=i 

= (Y,s TO ), meM, 

where the first equality follows by linearity (Lemma 25.10.3) and the second from 
(26.30). 

Part (ii) follows by noting that if M is uniform, then ln7r m does not depend on the 
hypothesis m. 

Part (iii) follows from Part (ii) because if all the mean signals are of equal energy, 

then the term 

00 

8m(t)dt 

-OO 

does not depend on the hypothesis. □ 
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By Note 26.6.2 we have: 

Note 26.6.4. If the mean signals are distinct, then the probability of a tie is zero. 



26.7 Performance Analysis 

The decision rule we derived in Section 26.6.1 uses the observed SP (Y(t)) to 
compute the vector T of inner products with an orthonormal basis (<fi\, ■ ■ ■ , (fid) 
via (26.26), with the result that the vector T has the conditional law specified 
in (26.28). Our decision rule then performs MAP decoding of M based on T. 
Consequently, the performance of our decoding rule is identical to the performance 
of the MAP rule for guessing M based on a vector T having the conditional law 
(26.28). The performance of this latter decoding rule was studied in Section 21.6. 
All that remains is to translate the results from that section in order to obtain 
performance bounds on our decoder. 

To translate the results from Section 21.6 we need to substitute No/2 for a 1 there; 
d for J there; and (26.29) for the mean vectors there. But there is one more 
translation we need: the bounds in Section 21.6 are expressed in terms of the 
Euclidean distance between the mean vectors, and here we prefer to express the 
bounds in terms of the distance between the mean signals. Fortunately, as we next 
show, the translation is straightforward. Because (<fii, ■ ■ ■ , (fid) is an orthonormal 
basis for span(si, . . . , Sjvi), it follows from Proposition 4.6.9 that 

d 
^(v,0,) 2 = ||v|| 2 2 , vespan( Sl ,...,s M ). (26.40) 

1=1 

Substituting s m < — s m /< for v in this identity yields 



(I 



£«* 



t=\ 



(s m ",<f>e)) = ||s m ' - s m »|| g 

y*oo 

(s m '(t) - S m "(t)) dt, 



where we have also used the fact that for v = s m < — s TO " we have, by the linearity 
of the inner product in its left argument, (v, (ft?) = (s m /, (fie) — (s m ", (fie). Thus, the 
squared Euclidean distance between two mean vectors in Section 21.6 is equal to 
the energy in the difference between the corresponding mean signals in our setup. 

Denoting by Pmap (error \M = m) the conditional probability of error of our decoder 
conditional on M = m, and denoting by p* (error) its unconditioned probability of 
error (which is the optimal probability of error) 

p* (error) = y, ^m Pmap (error |M = m), (26.41) 

we obtain from (21.57) 



Pmap (error M = m) < > Q 7== - + T rr-ln (26.42) 
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and hence by, (26.41), 



p (error) < } j TT m } j Q[ t== — - + — I u ■ 



m£A4 m'^m 

When M is uniform these bounds simplify to 



\/2N^ ||s m -s 



m || 2 ra 



Pmap (error |M = m) < V, Q 



\&m &m f || 2 



2N 



and 



> (error) < i_ y, E 2 h 



2Nr 



M uniform 



(26.43) 



(26.44) 



M uniform. (26.45) 



Similarly, we can use the results from Section 21.6 to lower-bound the probability 
of a guessing error. Indeed, using (21.63) we obtain 



m' ^m 



s,„ - s m /|L V^o/ 2 , '"" 

— in 

m' 1 1 2 ^m' 



p MA p(error|M = m) > max Q ( " am ^'" 2 + , v--»/- i n ^l (26 .46) 



\/2N^ ||s m -s 



p (error) > > 7r m max y — £■ + — in 

^^ m'#m \ V2N ||s m — S m /||g 7T TO ' 

For a uniform prior these bounds simplify to 



Pmap (error |M = m) > max Q 

ra 1 ^ra 



3 m a m' || jg 



2N f 



* (error) > — > max Q A 

meM \ 



3 m a m' || jg 



2Nr 



M uniform, 



(26.47) 



(26.48) 



M uniform. (26.49) 
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We begin with a lemma regarding sufficient statistics in testing whether a random 
vector Y was drawn A/"(/x, A) or J\f(—(i, A). 

Lemma 26.8.1. Let H be a binary RV, and let the random vector Y be A/"(/x,A) 
conditional on H = and Af(— jU, A) conditional on H = 1. ///x is a scalar multiple 
of the last column of A, then the last component of Y forms a sufficient statistic 
for guessing H based on Y. 
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Proof. Let n denote the number of components of the vectors Y and /x, so A 
is n x n. To show that y( n ) is a sufficient statistic we shall calculate the log 
likelihood-ratio function and then show that it is computable from Y^ n > . This 
approach, while straightforward, does not prove the lemma in its fullest generality 
because it only covers the case where Y has a density, i.e., when the covariance 
matrix A is nonsingular. Referring the reader to Section 26.14 on Page 606 for a 
somewhat less intuitive proof that covers all cases, we proceed here to address the 
case where A is nonsingular. 

The condition that fi is a scalar multiple of the last column of A is equivalent to 
the existence of some a € K such that 



fi = A 





w 



(26.50) 



When A is nonsingular we can use the explicit form of the density of the multivariate 
Gaussian distribution (23.56) to express the log likelihood-ratio as 

,-i(y-*t) T A- 1 (y-/*) 



. /v|H=o(y) , V( 27r ) r * dotA 

m — — ! — — = In 



f-Y\H=i(y) ' . 1 e -|(y+^) T A- 1( y+M) 

1 ^/(2^)"dotA 

= l -(y + ufA-^y + M ) - ^(y - M) T A" 1 (y - m) 

= y T A-V + M T A"V 

= 2y T A"V (26.51) 

= 2y T A~ 1 A(0, ...,0,a) T (26.52) 

= 2ay (n \ yet", (26.53) 

where (26.51) holds because the scalar tx T A _1 y is equal to its transpose, and the 
latter — by the transposition law (AB) = B A — is given by y (A ) /i, which by 
the symmetry of A (and hence also of its inverse), is equal to y T A _1 /x; and where 
(26.52) follows from (26.50). 

It follows from (26.53) that the likelihood-ratio is computable from the last com- 
ponent of Y, thus establishing that this component forms a sufficient statistic 
(Definition 20.12.2). □ 



26.8.2 The Binary Antipodal Case 

We begin the proof of Theorem 26.4.1 by considering the special case of binary 
hypothesis testing (M = 2) where the mean signals are antipodal to each other, 
i.e., when their sum is the all-zero signal. Since we are now treating the binary 
hypothesis testing setting, we denote the RV we wish to guess by H and assume 
that it takes value in the set {0, 1}. We denote the mean signal corresponding to 
H = by s, so the mean signal corresponding to H = 1 is — s. We assume that s 
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is an integrable signal that is bandlimited to W Hz. Conditional on H = 0, the 
received signal (Y(t)) is given at each t G R by s(t) + N(t), where (N(t)) is white 
Gaussian noise of PSD N /2 with respect to the bandwidth W. Conditional on 
H = 1 the time-i received signal is —s(t) + N(t). 

Recall from Definition 26.3.1 that to show that (Y,s) forms a sufficient statistic 
for guessing H based on the observation (Y(t)) we need to show that for every 
positive integer r\ and every choice of the epochs ti,...,t n 6 R the RV (Y,s) 
forms a sufficient statistic for guessing H based on the observation consisting of 
the random vector 11 

(Y(ti),...,Y(t v ),(Y,B)) T . (26.54) 

This we prove by showing that this vector satisfies the assumptions of Lemma 26.8.1. 
Denoting the conditional mean of this vector, conditional on H = 0, by fi, we have 



M 



(*(ti),..., *(*„), ||s|| 



2\T 



because 



E[Y(Q | H = 0] = E[s(i„) + N(t v ) \H = 0] 
= s(U) + E[N(t v )} 
= s(U), i' = 1, ... ,77, 



(26.55) 



and 



E[(Y, s) | H = 0] = E[(s + N,s) 



E[(N,s) 



(Theorem 25.12.2). The conditional covariance matrix A of the vector in (26.54) 
conditional on H = is given by the (rj + 1) x (rj + 1) matrix 



/ K NN (0) 

Kaw(£2 — tl) 



KjVAr(£i — t 2 ) 
Kivjv(O) 



K-NN{tr) — h) K NN (t v — t 2 ) 

\ «(ti)N /2 s(£ 2 )N /2 



^NN{t 2 



a(ti)N /2\ 



t v ) s(h)N Q /2 



K W w(0) 
«(t„)N /2 



s(t,)N /2 
l|s|| 2 2 N /2/ 



(26.56) 



(Proposition 25.15.2). Conditional on H = 1, the mean of the vector in (26.54) 
is —fi and the covariance matrix is also A. From Proposition 25.11.1 regarding 
linear functionals of Gaussian stochastic processes, it follows that, conditional on 
H = 0, the vector in (26.54) is Gaussian. Likewise conditional on H = 1. And by 
(26.55) & (26.56) the mean vector fi is equal to the last column of the covariance 
matrix (26.56) scaled by 2/N . 



11 The measurability of (Y,s) with respect to the cr-algebra generated by [Y(t)j follows from 
Proposition 25.10.1. 
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Having established that the random vector (26.54) satisfies the hypotheses of 
Lemma 26.8.1, we can infer that its last component, namely (Y,s), forms a suf- 
ficient statistic for guessing H based on the vector in (26.54). Since r\ € N and 
ti, . . . ,t„ € K are here arbitrary, this proves that (Y, s) is sufficient for guessing H 
based on \Y(t)), thus proving Theorem 26.4.1 for the two-hypotheses case with 
antipodal mean signals. 

26.8.3 The General Binary Case 

We next prove Theorem 26.4.1 in the more general binary hypothesis testing setting 
where the mean signals are not necessarily antipodal. We denote the mean signals 
corresponding to H = and H = 1 by Sq and si, and we assume that both are 
integrable signals that are bandlimited to WHz. We need to show that the vector 

«Y,So),<Y, Sl )) T (26.57) 

forms a sufficient statistic. 

Before giving a formal proof, we provide some intuition. Based on the observa- 
tion [Y(t)j, the receiver can compute the waveform 

Y(t)=Y(t)- So{t)+ 2 Sl{t \ t€R. 

Since the transformation from (Y(t)j to (Y(t)) is reversible, there is no loss in 
optimality in basing one's decision on (Y(t)). Conditional on H = the SP (Y(t)) 
is of the form 

Y(t) = so(t) + N(t)- ao ® + ° lit) 



Y(t) 
8o{t) ~ Si(t) 



+ N{t), te 



2 

whereas conditional on H = 1 it is of the form 

f (t)= ^H^_^)i^) 

Y(t) 

= _ «o(«)-«i(*) +m t€R . 

Consequently, the problem of guessing H based on \Y(ty\ is the antipodal problem 
we addressed before with the received waveform being (Y(t)) and with the mean 
signals corresponding to H = and H = 1 being (so — Si)/2 and — (so — Si)/2 
respectively. From our treatment of the antipodal case, we know that for this 
problem (Y, (s — SjJ/2) forms a sufficient statistic. This sufficient statistic can 
be written more explicitly as 

- sqj-SiA _ / _ sq + si s - si 
' 2 / \ 2 ' 2 

= I<Y, S o>-i<Y, Sl >-i(|| S(1 |l:-|NI>,- 
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thus demonstrating that this sufficient statistic is computable from the vector 
in (26.57). 12 

For readers who prefer a more formal proof we offer the following. Define 

s=^i. (26.58) 

Since (Y — (so + Si)/2, s) is computable from the vector in (26.57), it follows from 
Proposition 22.4.2 that to prove that the vector in (26.57) is sufficient it is enough to 
establish that (Y — (s + Si)/2, s) is sufficient. We thus need to show that for every 
ijGN and for every choice of the epochs t\,. . . ,t v G R, the RV (Y — (s + Si)/2,s) 
forms a sufficient statistic for the hypothesis testing problem of guessing H based 
on Y(ti), . . . , Y(t v ), (Y — (so + Si)/2,s), i.e., that for every prior on H , 

h-o- (y-?°±^-,b)— (y(t 1 ),... ) y(t, 

Equivalently, since subtracting deterministic quantities does not alter conditional 
independence, it suffices to show that 

ff^Y-^.U^)-*^! y ( g_*WHW 

This can be proved by applying Lemma 26.8.1 to the vector 

Y(t 1 )- Sa{tl) + Sl{tl \...,Y(tr 1 )- S0{t,l) + Sl{tn) ,/y-^1,s^ T 



which, conditional on H = 0, is Gaussian, with the covariance matrix in (26.56) 
and with the mean vector being the RHS of (26.55) (with s defined in (26.58)) and 
which, conditional on H = 1, is Gaussian with the same covariance matrix (26.56) 
but with the conditional mean being antipodal to the RHS of (26.55). 

26.8.4 The General Case 

We now prove the general (not necessarily binary) case of Theorem 26.4.1. There 
is surprisingly little left to do. The key is Proposition 22.3.2, which demonstrates 
that if a function of the observation is sufficient for testing between any two of the 
hypotheses, then it is sufficient for the multi-hypothesis testing problem. 

To prove that the vector (26.15) of inner products forms a sufficient statistic we 
need to show that for every r\ G N and for any choice of the epochs t\, . . . , t^ € R 
the inner products vector (26.15) forms a sufficient statistic for guessing M based 
on the observation consisting of Y(t\), . . . , Y^^) and of the inner products vector 
(Definition 26.3.1). By Proposition 22.3.2, it is enough to show this when testing 
between any two fixed distinct messages ml ,m" € M. But in this case the suffi- 
ciency of the inner products vector (26.15) follows directly from Section 26.8.3 and 



12 This is only a heuristic argument because it only shows that it is optimal to guess H based 
on the vector (26.57). It does not prove that this vector forms a sufficient statistic. 
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Proposition 22.4.2 because, by the general binary hypothesis testing case treated in 
Section 26.8.3, the two inner products (Y,s TO >) & (Y,s m ») suffice for this problem, 
and these two inner products are obviously computable from the inner products 
vector (26.15) (simply by ignoring its other components). This completes the proof 
of Theorem 26.4.1. 

26.9 The Front-End Filter 

Receivers in practice rarely have the structure depicted in Figure 26.1 because — 
although mathematically optimal — its hardware implementation is challenging. 
The difficulty is related to the "dynamic range" problem in implementing the 
matched filter: it is very difficult to design a perfectly-linear system to exact specifi- 
cation. Linearity is usually only guaranteed for a certain range of input amplitudes. 
Once the amplitude of the signal exceeds a certain level, the circuit often "clips" 
the input waveform and no longer behaves linearly. Similarly, input signals that are 
too small might be below the sensitivity of the circuit and might therefore produce 
no output, thus violating linearity. This is certainly the case with circuits that 
employ analog-to-digital conversion followed by digital processing, because analog- 
to-digital converters can only represent the input using a fixed number of bits. 
The problem with the structure depicted in Figure 26.1 is that the noise (N(t)) is 
typically much larger than the mean signal, so it becomes very difficult to design a 
circuit to exact specifications that will be linear enough to guarantee that its action 
on the received waveform (consisting of the weak transmitted waveform and the 
strong additive noise) be the sum of the required responses to the mean signal and 
to the noise-signal. (That the noise is typically much larger than the mean signals 
can be seen from the heuristic plot of its PSD; see Figure 25.3. White Gaussian 
noise is often of PSD No/2 over frequency bands that are much larger than the 
band [— W, W] so, by (25.30), the variance of the noise can be extremely large.) 

The engineering solution to the dynamic range problem is to pass the received 
waveform through a "front-end filter" and to then feed this filter's output to the 
matched filter; see Figure 26.3. Except for a few very stringent requirements, 
the specifications of the front-end filter are relatively lax. The first specification 
is that the filter be linear over a very large range of input levels. This is usu- 
ally accomplished by using only passive elements to design the filter. The second 
requirement is that the front-end filter's frequency response be of unit-gain over 
the mean signals' frequency band [— W, W] so that it will not distort the mean 
signals. 13 Additionally, we require that the filter be stable and that its frequency 
response decay to zero sharply for frequencies outside the band [— W, W]. This lat- 
ter condition guarantees that the filter's response to the noise be of small variance 
so that the dynamic range of the signal at the filter's output be moderate. If we 
denote the front-end filter's impulse response by hpg, then the key mathematical 
requirements are linearity; stability, i.e., 

/oo 
\h FE {t)\ dt <oo; (26.59) 

-CO 



Imprecisions here can often be corrected using signal processing 
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Figure 26.3: Feeding the signal to a front-end filter and then computing the inner 
products with the mean signals. 
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Figure 26.4: An example of the frequency response of a front-end filter. 

and the unit-gain requirement 

W/) = 1. I/I < W - (26-60) 

An example of the frequency response of a front-end filter is depicted in Figure 26.4. 

In the rest of this section we shall prove that, as long as these assumptions are met, 
there is no loss in optimality in introducing the front-end filter as in Figure 26.3. 
(In the ideal mathematical world there is, of course, nothing to be gained from this 
filter, because the structure we introduced in Figure 26.1 is optimal.) 

The crux of the proof is in showing that — like (Y(t)) — the front-end filter's output 
is the sum of the transmitted signal and white Gaussian noise of PSD No/2 with 
respect to the bandwidth W. Once this is established, the result follows by recalling 
that the conditional joint distribution of the matched filters' outputs does not 
depend on the PSD of the noise outside the band [— W,W] (Note 26.5.1). 

We thus proceed to analyze the front-end filter's output, which we denote by \Y(t)j: 



(Y(t)) = {Y(t))*h 



FE- 



(26.61) 
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We first note that (26.60) and the assumption that s m is an integrable signal that 
is bandlimited to W Hz guarantee that 

s m *h FE = s m , m G M (26.62) 

(Proposition 6.5.2 and Proposition 6.4.10 cf. (b)). By (26.62) and by the linearity 
of the filter we can thus express the filter's output (conditional on M = m) as 

(Y(t)) = (Y(t))*h FE 

= s m * h FE + (N(t)) * h FE 

= s m + (N(t))*h FE . (26.63) 

We next show that the SP (N(t)) *h FE on the RHS of (26.63) is white Gaussian 
noise of PSD No/2 with respect to the bandwidth W. This follows from Theo- 
rem 25.13.2. Indeed, being the result of passing a measurable stationary Gaussian 
SP through a stable filter, it is a measurable stationary Gaussian SP. And its PSD 
is 

f^S NN (f)\h FE (f)\\ (26.64) 

which is equal to No/2 for all frequencies / G [— W, W], because for these frequen- 
cies Sjvjv(/) is equal to N /2 and /i FE (/) is equal to one. Note that at frequencies 
outside the band [-W, W] the PSD of (N(t)) *h FE may differ from that of (N(t)) . 

We thus conclude that the front-end filter's output, like its input, can be expressed 
as the transmitted signal corrupted by white Gaussian noise of PSD No/2 with 
respect to the bandwidth W. Note 26.5.1 now guarantees that for every m € M 
we have that, conditional on M = m, the distribution of 

/oo 
Y{t)s M {t)dt 
-oo 

is identical to the conditional distribution of the random vector in (26.15). 

The advantage of the front-end filter becomes apparent when we re-examine the 
PSD of the noise at its output. If the front-end filter's frequency response decays 
very sharply to zero for frequencies outside the band [— W, W], then, by (26.64), 
this PSD will be nearly zero outside this band. Consequently, the variance of the 
noise at the front-end filter's output — which is the integral of this PSD — will be 
greatly reduced. This will guarantee that the dynamic range at the filter's output 
be much smaller than at its input, thus simplifying the implementation of the 
matched filters. 



26.10 Detection in Passband 

The detection problem in passband is very similar to the one in baseband. The 
difference is that the mean signals {s m } are now assumed to be integrable signals 
that are bandlimited to W Hz around the carrier frequency f c (Definition 7.2.1) 
and that the noise is now assumed to be white Gaussian noise of PSD No/2 with 
respect to the bandwidth W around f c (Definition 25.15.3). 
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Here too, the inner products in (26.16) form a sufficient statistic. So do those in 
(26.18) whenever the signals {sj} satisfy 

s m e span(§i, . . . ,§„), meM 

and are integrable signals that are bandlimited to W Hz around f c . 

For every m € M the conditional distribution of the vector of inner products 
in (26.18), conditional on M = m, is Gaussian with mean vector (26.22) and 
covariance matrix (26.23). The latter covariance matrix can also be written in 
terms of the baseband representation of the mean signals using the relation 

(§j/,Sj») =2Re((sj/ )B B,Sj»,BB)), (26.65) 

where Sj/.bb and Sj",bb are the baseband representations of Sj> and Sj» (Theo- 
rem 7.6.10). 

The computation of the inner products (26.18) can be performed in passband 
by feeding the signal Y directly to filters that are matched to the passband sig- 
nals {sj}, or in baseband by expressing the inner product (Y,§j) in terms of the 
baseband representation s^bb of Sj as follows: 

<Y,s,) = (Y,t>-> 2Re(S j ,BB(t)e i2 ^ t )) 

/CO 
(Y(t) cos{2irf c t)) Re(S i ,BB(*)) d* 
-co 

/CO 
(Y(t) sin(27r/ c t)) Im(g J - i BB(t)) dt. 



— DC 



This expression suggests computing the inner product (Y,Sj) using two baseband 
matched filters: one that is matched to Re(§j ! BB) and that is fed the product 
of (Y(t)) and cos(27r/ c i), and one that is matched to Im(sj.BB) and that is fed the 
product of (Y(t)) and sin(27r/ c ^). 14 

As discussed in Section 26.9, in practice one typically first feeds the received sig- 
nal (Y(t)) to a stable highly-linear bandpass filter of frequency response /ipb-fe(") 
satisfying 

Wfe(/) = 1, ||/|-/c|<W/2, (26.66) 

with the frequency response decaying drastically at other frequencies to guarantee 
that the filter's output be of small dynamic range. 



14 Since the baseband representation of an integrable passband signal that is bandlimited to W 
Hz around the carrier frequency f c is integrable (Proposition 7.6.2), it follows that our assumption 
that Sj is an integrable function that is bandlimited to W Hz around the carrier frequency f c 
guarantees that both t i— > cos(2nf c t) Re(s 3iBB (t)) and t t-^ sm(2nf c t) Im(sj BB (t)) are inte- 
grable. Hence, with probability one, both the integrals J^° (Y(t) cos(27r/ c t)) Re(sjBB(t)) dt and 
IZo ( y W sin(27r/ c i)) Im(s JiB B(t)) dt exist. 
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26.11 Some Examples 



26.11.1 Binary Hypothesis Testing 

Before treating the general binary hypothesis testing problem we begin with the 
case of antipodal signaling with a uniform prior. In this case 



so 



-Si =s, 



(26.67) 



where s is some integrable signal that is bandlimited to W Hz. We denote its 
energy by E s , i.e., 

E s = ||s|| 2 2 (26.68) 

and assume that it is strictly positive. In this case the dimension of the linear 
subspace spanned by the mean signals is one, and this subspace is spanned by the 
unit-norm signal 

0=^-. (26.69) 

Depending on the outcome of a fair coin toss, either s or — s is sent over the channel. 
We observe the SP (Y(t)j given by the sum of the transmitted signal and white 
Gaussian noise of PSD No/2 with respect to the bandwidth W, and we wish to 
guess which signal was sent. How should we form our guess? 

By Theorem 26.4.1 a sufficient statistic for this guessing problem is T = (Y,(/>). 
Conditional on H = 0, we have T ~ 7V(\/Es, No/2J , whereas, conditional on 
H = 1, we have T ~ AM — \/Esi No/2). How to guess H based on T is the problem 
we addressed in Section 20.10. There we showed that it is optimal to guess "H = 0" 
if T > and to guess "H = 1" if T < 0. (The case T = occurs with probability 
zero, so we need not worry about it.) An optimal decision rule for guessing H 
based on (Y(t)) is thus: 



Guess "H = 0" if 



Y(t)s(t)dt > 0. 



(26.70) 



Let Pmap (error |s) denote the conditional probability of error of this decision rule 
given that s was sent; let pMAp(error| — s) be similarly defined; and let p* (error) 
denote the optimal probability of error of this problem. By the optimality of our 
rule, 

p* (error) = - (p M AP (error |s) + pmap (error | - s)). 



Using the expression for the error probability derived in Section 20.10 we obtain 

(26.71) 



p*(error)=Q IW 2 Hl* 



Nr 



which, in view of (26.68), can also be written as 




(26.72) 
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Note that, as expected from Section 26.5.2 and in particular from Proposition 26.5.2, 
the probability of error is determined by the "geometry" of the problem, which in 
this case is summarized by the energy in s. 

There is also a nice geometric interpretation to (26.72). The distance between the 
mean signals s and — s is ||s — (— s)|| a = 2\/E^. Half the distance is \/E^. The 
inner product between the noise and the unit-length vector <f> pointing from — s 



to s is A/"(0, No/2). Half the distance thus corresponds to vEs/yNo/^ standard 
deviations of this inner product. The probability of error is thus the probability 
that a standard Gaussian is greater than half the distance between the signals as 
measured by standard deviations of the inner product between the noise and the 
unit-length vector pointing from — s towards s. 

Consider now the more general binary hypothesis testing problem where both hy- 
potheses are still equally likely, but where now the mean signals s and Si are not 
antipodal, i.e., they do not sum to zero. Our approach to this problem is to reduce 
it to the antipodal case we already treated. We begin by forming the signal (Y(t)) 
by subtracting (so + Si)/2 from the received signal, so 

Y(t)=Y(t)-^(s (t) + Sl (t)), ief. (26.73) 

Since Y{t) can be recovered from Y(t) by adding (s (£) + Si(£))/2, the smallest 
probability of a guessing error that can be achieved based on (Y(t)) is no larger 
than that which can be achieved based on (Y(t)). (The two are, in fact, the same 
because (Y(t)j can be computed from (Y(t)).) 

The advantage of using (Y(t)) becomes apparent once we compute its conditional 
law given H. Conditional on H = 0, we have Y(t) = (s (t) - s 1 (tj)/2 + N(t), 
whereas conditional on H = 1, we have Y(t) = — (so(t) — Si(t))/2 + N(t). Thus, the 
guessing problem given (Y(t)) is exactly the problem we addressed in the antipodal 
case with (so — Si)/2 playing the role of s. We thus obtain that an optimal decision 
rule is to guess U H = 0" if jY(t)(so{t) — si(t))/2dt is nonnegative. Or stated in 
terms of (Y(t)) using (26.73): 



Guess "71 = 0" if / [Y(t)- So{t) + Sl{t) ) So{t) - Sl{t) dt>0. 



(26.74) 




The error probability associated with this decision rule is obtained from (26.71) by 
substituting (sq — Si)/2 for s: 



(26.75) 



This expression too has a nice geometric interpretation. The inner product between 
the noise and the unit-norm signal that is pointing from S! to s is A/"(0, N /2). The 
"distance" between the signals is ||so — Si|| 2 . Half the distance is ||so — Si|| 2 /2, 
which corresponds to 

K-sillg/2 
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standard deviations of a 7V(0,No/2) random variable. The probability of error 
(26.75) is thus the probability that the inner product between the noise and the 
unit-norm signal that is pointing from S! to s exceeds half the distance between 
the signals. 

26.11.2 8-PSK 

We next present an example of detection in passband. For concreteness we consider 
8-PSK, which stands for "8-ary Phase Shift Keying." Here the number of hypothe- 
ses is eight, so M. = {1,2, ... ,8} and M = 8. We assume that M is uniformly 
distributed over M. . Conditional on M = m, the received signal is given by 

Y(t) = a m (t) + N(t), (68, (26.76) 

where 

s m {t) = 2Re(c m s BB {t)e'^^ t ), tef; (26.77) 

c m = ae im ir (26.78) 

for some positive real a; the baseband signal Sbb is an integrable complex signal 
that is bandlimited to W/2 Hz and of unit energy 

||sbb|| 2 = 1; (26.79) 

the carrier frequency f c satisfies f c > W/2; and (N(t)) is white Gaussian noise 
of PSD No/2 with respect to the bandwidth W around the carrier frequency f c 
(Definition 25.15.3). Irrespective of M, the transmitted energy E s is given by 

Eh = IIS " 2 



TO || 2 

\27Tf c t\ 



°° , , 2 



.2 



2Re(c m s B B(i)e l2 ^ t ) dt 



= 2a\ (26.80) 

as can be verified using the relationship between energy in passband and baseband 
(Theorem 7.6.10) and using (26.79). 

The transmitted waveform s m can also be written in a form that is highly suggestive 
of a choice of an orthonormal basis for span(si, . . . , Sm): 

s m {t) = V2Re(c m ) V2Re(s B B(i)e i27r ^*)+V2Im(c m ) V2 Re(i s BB {t) e i2 ^*) 



<t>l(t) 4>2 (t) 

= V2Re(c m ) 4> x {t) + V2Im{c m ) <j> 2 (t), 

where 

Mt) = \ / 2Re(s BB (i)e i27r/c '), teR, 
M*)- \ / 2Re(is BB (i)e i27r/ct ), teR. 
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Figure 26.5: Region of points (t^\t^) resulting in guessing "M = 1. 



From Theorem 7.6.10 on inner products in passband and baseband, it follows that 
4>i and 4>2 are orthogonal. Also, from that theorem and from (26.79), it follows 
that they are of unit energy. Thus, the tuple (4>i,4>2) is an orthonormal basis 
for span(si, . . . , sjvi). Consequently, the vector T = ((Y, 4>i) , (Y, 4>2)) T forms a 
sufficient statistic for guessing M based on (Y(t)), and, conditional on M = m, the 
components of T are independent with T^> ~ N {y2a cos(2irm / $) , N /2) and with 
T 1 - 2 ' ~ A/"(v2o:sin(27rm/8), No/2J. We have thus reduced the guessing problem to 
that of guessing M based on a two-dimensional vector T. 

The problem of guessing M based on T was studied in Section 21.4. To lift the 
results from that section, we need to substitute \/2a for A and to substitute No/2 
for a 2 . For example, the region where we guess "M = 1" is given in Figure 26.5. 

For the scenario we described, some engineers prefer to use complex random vari- 
ables (Chapter 17). Rather than viewing T as a two-dimensional real random 
vector, they prefer to view it as a (scalar) complex random variable whose real 
part is (Y, <f>i) and whose imaginary part is (Y, 4>2) ■ Conditional on M = m, this 
CRV has the form 

V2c m + Z, Z~A/"c(0,N ), (26.81) 

where Ac(0,No) denotes the circularly-symmetric variance-No complex Gaussian 
distribution (Note 24.3.13). 

The expression for the probability of error of our detector can also be lifted from 
Section 21.4. Substituting, as above, \f2ca for A and No/2 for a 2 , we obtain 
from (21.25) that the conditional probability of error pMAp(error|M = m) of our 
proposed decision rule is given for every m G M. by 



PMAp(error|M = m) 



j si»^e++) (J0, if, 



8' 



The conditional probability of error can also be expressed in terms of the trans- 
mitted energy E s using (26.80). Doing that and recalling that the conditional 
probability of error does not depend on the transmitted message, we obtain that 
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the average probability of error p* (error) is given by 

(26.82) 



p* (error) = 


7T 


pTT — tp 
/ ^ 


E sSi „ 2 ^, 


) d0, 


lP = 


■K 

8' 


N sin 2 (« + ^ 



Note 26.11.1. The expression (26.82) continues to hold also for M-PSK where 
c m = (ie l2lm ' M for M > 2 not necessarily equal to 8, provided that we define 
iP = ir/Mm (26.82). 



26.11.3 Orthogonal Keying 

We next consider M-ary orthogonal keying. We assume that the RV M that we 
wish to guess is uniformly distributed over the set M. = {1, . . . , M}, where M > 2. 
The mean signals are assumed to be orthogonal and of equal (strictly) positive 
energy E s : 

(s m <,s m ») = E s I{m' = m"}, m',m"eM. (26.83) 

Since M is uniform, and since the mean signals are of equal energy, it follows 
from Theorem 26.6.3 that to minimize the probability of guessing incorrectly, it is 
optimal to correlate the received waveform (Y(t)) with each of the mean signals 
and to pick the message whose mean signal gives the highest correlation: 



Guess "m" if (Y,s m ) = max (Y,s r 

m'eM 



(26.84) 



with ties (which occur with probability zero) being resolved by picking a random 
message among those that attain the highest correlation. 

We next address the probability of error of this optimal decision rule. We first 
define the vector (TW, . . . ,T< M )) T by 

T W = r Y (t) s -^dt, £e{l,...,M} 

J -co Vts 

and recast the decision rule as guessing "M = m" if T^ m ' = max m / e _A4 J 1 '™ ', with 
ties being resolved at random among the components of T that are maximal. 

Let pMAp(error|M = m) denote the conditional probability of error of this decoding 
rule, conditional on M = m. Conditional on M = m, an error occurs in two cases: 
when m does not attain the highest score or when m attains the highest score but 
this score is also attained by some other message and the tie is not resolved in m's 
favor. Since the probability of a tie is zero (Note 26.6.4), we may ignore the second 
case and only compute the probability that an incorrect message is assigned a score 
that is (strictly) higher than the one associated with m. Thus, 

PMAp(error|M = m) 

= Pr[max{TW, . . . , T^ 1 ), T {m+l \ . . . ,T< M )} > T^ | M = m] . (26.85) 
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From (26.28) and the orthogonality of the signals (26.83) we have that, condi- 
tional on M = m, the random vector T is Gaussian with the mean of its m-th 
component being yEsj the mean of its other components being zero, and with all 
the components being of variance N /2 and independent of each other. Thus, the 
conditional probability of error given M = m is "the probability that at least one 
of M - 1 IID 7V(0, N /2) random variables exceeds the value of a J\f(y/E^, N /2) 
random variable that is independent of them." Having recast the probability of 
error conditional on M = m in a way that does not involve m (the clause in quotes 
makes no reference to m), we conclude that the conditional probability of error 
given that M = m does not depend on m: 

PMAp(error|M = m) = j»MAp(error|M = 1), m £ M. (26.86) 

This conditional probability of error can be computed starting from (26.85) as: 

PMAp(error|M= 1) 

= Pr[max{T( 2 \ . . . ,T^} > T« \ M = l] 
= 1 - Pr[max{T( 2 \ . . . ,T< M >} < T« | M = l] 

/CO 
/ T U)|M=i(*)Pr[ max { T(2) ' • • • ' T(M) } <t\M = 1,T« = t] dt 
-CO 
/CO 
/t(D|m=iW Pr[max{T( 2 ), . . . ,T^} < 1 1 M = l] At 
-co 

/CO 
/t(d |m=i (*) Pr [r< 2 ) < i, . . . , T< M > < i | M = 1] dt 
-oo 

= 1 - / / T(1 , , M=1 (t) (Pr [r< 2 ) < i I M = 1] ) dt 

«/ — CO 



t ^ M - : 




(26.87) 



where the first equality follows from (26.85); the second because the conditional 
probability of an event and its complement add to one; the third by conditioning 
on T^ 1 ' = t and integrating it out, i.e., by noting that for any random variable X 
of density fx{-) and for any random variable Y, 

/oo 
fx{x)Pr[Y <x\X = x]dx, (26.88) 

-oo 

with X here being equal to T^ 1 ' and with Y here being max{T' 2 ', . . . ,T( M )}; the 
fourth from the conditional independence of T^ 1 ' and (T^ 2 ' , . . . , T^ M ') given M = 1, 
which implies the conditional independence of T' 1 ' and maxlT*- 2 ), . . . ,T ( - M - ) } given 
M = 1; the fifth because the maximum of random variables does not exceed t if, 
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and only if, none of them exceeds t 

(max{T( 2 \ . . . , T( M >} < t) *> (t^ <*,..., T< M > < t) ; 

the sixth because, conditional on M = 1, the random variables X 1 ' 2 ', . . . ,T^ M ) are 
IID so 

Pr[T (2) < t,...,T {M) < t\M = l] = (Pr[T (2) <t\M = l]\ ; 

the seventh because, conditional on M = 1, we have T^ 2 ' ~ 7V(0, No/2) and using 
(19.12b); the eighth because, conditional on M = 1, we have T^ ~ Af(V^~ s , N /2) 
so its conditional density can be written explicitly using (19.6); and the final equal- 
ity using the change of variable 

t = ^ ~ • (26.89) 

v/NT/2 

Using (26.86) and (26.87) we obtain that if p* (error) denotes the unconditional 
probability of error, then p*(error) = Pmap (error \M = 1) and 



(26.90) 



An alternative expression for the probability of error can be derived using the 
Binomial Expansion 

(a+b) n = J7 [.ja^V, (neN, o,6e»). (26.91) 

3=0 \ ] ' 



p* 


(error) = 


= 1 - 


1 


f-OC 

— oo 


e 


-r 2 /2 


( x 


-fl( 


T+ ^ 


/ No / 


v M- 


-1 

dr. 


V2tt o 



Substituting 



o= !• ' , = _Q f r+ VT^ ' " = M -- L 



in (26.91) yields 

M-/f)P!Hv)(<-/r' 



from which we obtain from (26.90) (using the linearity of integration and the fact 
that the Gaussian density integrates to one) 



(26.92) 
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For the case where M = 2 the expression (26.90) for the probability of error can 
be simplified to 

p* (error) = Q fW-l J , M. = 2, (26.93) 

as we proceed to show in two different ways. The first way is to note that for 
M = 2 the probability of error can be expressed, using (26.85) and (26.86), as 

p M Ap(error|M = 1) = Pr[T (2) > T (1) | M = l] 

= Pr[T (2) -T (1) >0| M = 1] 

a 

No 

where the last equality follows because, conditional on M = 1, the random vari- 
ables T*- 1 ' and T^ 1 ' are independent Gaussians of variance No/2 with the first 
having mean \/Es and the second having zero mean, so their difference T^ 2 ' — T^ 1 ' 
is AM — vE^, No). (The probability that a AM— vE^, No) RV exceeds zero can be 
computed using (19.12a).) The second way of showing (26.93) it to use (26.75) and 
to note that the orthogonality of si and S2 implies ||si — S2|| 2 = ||si|| 2 + ||s2|| 2 = 
2E S . 



26.11.4 The Mary Simplex 

We next describe a detection problem that is intimately related to the problem we 
addressed in Section 26.11.3. To motivate the problem we first note: 

Proposition 26.11.2. Consider the setup described in Section 26.2. If s is any 

integrable signal that is bandlimited to W Hz, then the probability of error associated 
with the mean signals {si, . . . , sm} and the prior {n m } is the same as with the mean 
signals {si — s, . . . , sm — s} and the same prior. 

Proof. We have essentially given a proof of this result in Section 14.3 and also 
in Section 26.11.1 in our analysis of nonantipodal signaling. The idea is that, 
by subtracting the signal s from the received waveform, the receiver can make the 
problem with mean signals {si, . . . , Sm} appear as though it were the problem with 
mean signals {si — s, . . . , Sm — s}. Conversely, by adding s, the receiver can make 
the latter appear as though it were the former. Consequently, the best performance 
achievable in the two settings must be identical. □ 

The expected transmitted energy when employing the mean signals {si, . . . ,sm} 
may be different than when employing the mean signals {si — s, . . . , sm — s}. In 
subtracting the signal s one can change the average transmitted energy for better 
or worse. As we argued in Section 14.3, to minimize the expected transmitted 
energy, one should choose s to correspond to the "center of gravity" of the mean 
signals: 
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Proposition 26.11.3. Let the prior {7r m } and mean signals {s m } be given. Let 



E TT, 



m£M 



Then, for any energy-limited signal s 



E- 



m || ^m 



E- 



Sm — S*||g< > 7Tm, llSm, -S|| 



m || ^m 



with equality if, and only if, s is indistinguishable from s* 



(26.94) 



(26.95) 



Proof. Writing s m — s as (s TO — s*) + (s* — s) we have 



E 



7T m ||S m 

= E ^ 

= E ^ 

= Yl n " 

= E n " 

= E n " 

> E ^ 



|| (^777, 


- s* 


l^m 


S *ll 


l^m 


S *ll 


l^m 


S^ || 


l^m 


s^ || 


H^m 


s *ll 



2 
2 ' 



•'* -sjHg 

E Tm IN* - S||g +2 ^ 7T TO (s m -S*,S* - s) 
iG.A/1 m£A1 

S„ — S||| + 2/ ^ 7T m (s m -S*),S* -S 



m£A1 



m£A1 



s* - s|| 2 + 2 (s* - s*,s* - s) 



s* -s 



with the inequality being an equality if, and only if, ||s* — s|L = 0. 



D 



We can now construct the simplex signals as follows. We start with M orthonormal 
waveforms <pi, . . . , 0m 



(0m', </>m") = l{rri = m"}, m',m"eM 



(26.96) 



that are integrable and bandlimited to W Hz. We set to be their "center of 
gravity" with respect to the uniform prior 



^ = M E *» 



(26.97) 



m£M 



Using (26.96), (26.97), and the basic properties of the inner product (3.6)-(3.10) 
it is easily verified that 



\<f>m' ~ &, <Am" - 0) = l|m' = m"j - — , m', in" e M 



(26.98) 
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4>2 




Figure 26.6: Starting with two orthonormal signals and subtracting the "center of 
gravity" from each we obtain two antipodal signals. Scaling these antipodal signals 
results in the simplex constellation with two signals. 





Figure 26.7: Constructing the simplex constellation with three points from three 
orthonormal signals. Left figure depicts the orthonormal constellation and its cen- 
ter of gravity; middle figure depicts the result of subtracting the center of gravity, 
and the right figure depicts the result of scaling (from a different perspective). 



We now define the M-ary simplex constellation with energy E s by 

s m = x/I~ s y ^y {4>m - 4>) , meM. (26.99) 

The construction for the case where M = 2 is depicted in Figure 26.6. It yields 
the binary antipodal signaling scheme. The construction for M = 3 is depicted in 
Figure 26.7. 

From (26.99) and (26.98) we obtain for distinct m',m" € M 



||s m || s = E s and (s TO -,s r , 



M- 1 



(26.100) 



Also, from (26.99) we see that {s m } can be viewed as the result of subtracting the 
center of gravity from orthogonal signals of energy E s M/(M — 1). Consequently, 
the least error probability that can be achieved in detecting simplex signals of 
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VE 8 ^ 



Figure 26.8: Adding a properly scaled signal ip that is orthogonal to all the 
elements of a simplex constellation results in an orthogonal constellation. 



energy E s is the same as the least error probability that can be achieved in detecting 
orthogonal signals of energy 

M , 

(Proposition 26.11.2). From the expression for the error probability in orthogonal 
signaling (26.90) we obtain for the simplex signals with a uniform prior 



p* (error) = 


= 1- 


1 


o- 


( 


-a( 


r\ 




«. M- 


-1 

dr. 


/ M 2E S \ 


y/2n j 


/M-lNoJ 



(26.102) 

The decision rule for the simplex constellation can also be derived by exploiting the 
relationship to orthogonal keying. For example, if ip is a unit-energy integrable sig- 
nal that is bandlimited to W Hz and that is orthogonal to the signals {si, . . . , Sjvi}, 
then, by (26.100), the waveforms 

1 



VM- 1 



xp > (26.103) 

J meM 

(See Figure 26.8 for a demonstra- 

2 signals 



are orthogonal, each of energy E s M/(M — 1 

tion of the process of obtaining an orthogonal constellation with M 

by adding a signal xp to each of the signals in a binary simplex constellation. 

Consequently, in order to decode the simplex signals contaminated by white Gaus 



sian noise with respect to the bandwidth W, we can add 



ml s ip to the re- 



ceived waveform and then feed the result to an optimal detector for orthogonal 
keying. 



26.11.5 Bi-Orthogonal Keying 

Starting with an orthogonal constellation, we can double the number of signals 
without reducing the minimum distance. This construction, which results in the 
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Figure 26.9: A bi-orthogonal constellation with six signals. 



"bi-orthogonal signal set" is the topic of this section. To construct the bi-orthogonal 
signal set with 2k signals, we start with n > I orthonormal signals ((pi, . . . , (j) K ) 
and define the 2k bi-orthogonal signal set {si iU , s^d, • • • , S KjU , s K .d} by 



-yEs</>K and s 



vA 



e {!,...,«}. 



(26.104) 



We can think of "u" as standing for "up" and of "d" as standing for "down," so to 
each signal <p v there correspond two signals in the bi-orthogonal constellation: the 
"up signal" that corresponds to multiplying ^/XT S 4> V by +1 and the "down signal" 
that corresponds to multiplying y/E^<p v by —1. Only bi-orthogonal signal sets with 
an even number of signals are defined. The constructed signals are all of energy E s : 



S ^,d 



VF~ v£ {!,...,«}. 



(26.105) 



A bi-orthogonal constellation with six points (k = 3) is depicted in Figure 26.9. 
Suppose that each of the signals (f>\, . . . , (fi K is an integrable signal that is band- 
limited to W Hz and that, consequently, so are all the signals in the constructed 
bi-orthogonal signal set. A signal is picked uniformly at random from the signal set 
and is sent over a channel. We observe the stochastic process (Y(t)) given by the 
sum of the transmitted signal and white Gaussian noise of PSD No/2 with respect 
to the bandwidth W. How should we guess which signal was sent? 

Since the signal was chosen equiprobably, and since all the signals in the signal set 
are of the same energy, it is optimal to consider the inner products 



(Y, Si )U ) , (Y, si, d ) , . . . , (Y, s K!U ) , (Y, s K;d ) 



(26.106) 



and to pick the signal in the signal set corresponding to the largest of these inner 
products. By (26.104) we have for every v € {1, ...,«} that s„ ]U = — s^d so 
(Y,s„ iU ) = — (Y,s„.d) and hence 



max{(Y,s„, u ) ,(Y,s„ id )} = |(Y,s„, u )|, v e {1, . . . ,k}. 



(26.107) 
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Equivalent^, by (26.104), 

max{(Y,s„ !U ) , (Y,s„ )d )} = y^^Y^,,)], v G {1, . . . , k}. 

To find the maximum of the 2k terms in (26.106) we can first compute for each 
v G {1, . . . , k} the maximum between (Y, S„ )U ) and (Y, S„ )d ) and then compute the 
maximum of the k results: 

max J (Y, si )U ) , (Y, si,d> , . . . , (Y, s K , u ) , (Y, s K , d ) j 

= max|max{(Y,Si iU ) , (Y,Si id )}, . . . ,max{(Y, s K ,„) , (Y,s K>d )} j. 

Using this approach, we obtain from (26.107) the following optimal two-step proce- 
dure: first find which v* in {1, . . . , n} attains the maximum of the absolute values 
of the inner products 

max {\(Y,fa,)\} 
ve{i,. ..,«} 

and then, after you have found v* , guess "s^» iU " if (Y, fa,*) > and guess "s„». d " 
if(Y,^,)<0. 

We next compute the probability of error of this optimal guessing rule. It is not 
difficult to see that the conditional probability of error does not depend on the 
message we condition on. For concreteness, we shall analyze the probability of 
error associated with the message corresponding to the signal Si iU , a probability 
that we denote by pMAp( en " or l s i.u), with the corresponding conditional probability 
of correct decoding pmap (correct |si iU ) = 1 — pMAp(error|si ]U ). To simplify the 
typesetting, we shall denote the conditional probability of the event A given that 
Si iU is sent by Pr(.4.|si ;U ). 

Since the probability of ties in the likelihood function is zero (Note 26.6.4) 

Pmap (correct | si ;U ) 

= Pr(- (Y, fa) < (Y, fa) and max { | (Y, fa)\ } < (Y, fa) 

= Pr((Y,0i) >0and max {|(Y, fa)\} < (Y, fa) Si, u 
= / f^ ><t>1 ) ]si Jt)Pr\max{\{Y,fa)\}<t bi, U) (Y, fa) = t] dt 
= / /{Y^^lsx.uW Pr max {|(Y,^)|} < t SiJ dt 
/(Y^K.W (Pr[|<Y, fa)\ < 1 1 Sl , u ]) K_1 dt 

(\ K — 1 
1-2Q ; dt 

1 r e -^/ 2 (l-2Q(r+^)) dr, (26.108) 



2tt J- t fW \ V V N 



with the following justification. The first equality follows from the definition of 
our optimal decoder and from the fact that ties occur with probability zero. The 
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second equality follows by trivial algebra (— £ < £ if, and only if, £ > 0). The third 
equality follows by conditioning on (Y,Si ;U ) being equal to t and integrating t 
out while noting that a correct decision can only be made if t > 0, in which case 
the condition (Y,(/>i) > is satisfied automatically. The fourth equality follows 
because, conditional on the signal Si ]U being sent, the random variable (Y,Si jU ) is 
independent of the random variables {|(Y, 4> v )\}2<v<k,- The fifth equality follows 
because, conditional on Si iU being sent, the random variables {|(Y, 4>v)\\2<v<k are 
IID. The sixth equality follows because, conditional on si u being sent, we have 
<Y, fa) ~ M{^T S , N /2) and (Y, fa) ~ W(0, N /2), so 



Pr[|(Y,<£ 2 >|<* 

<Y,0 2 ; 



Pr 



s l,u 



< 



Sl.u 



1-Pr 
1 -Pr 
1-2Q 



<Y,02 



Sl,u 



> 



/. 



L v/N^/2 ^V2 
t 



Sl,u 



Pr 



(Y,<ft 2 ) < -t 
L7^V2 " v%/2 



Sl,u 



Finally, (26.108) follows from the substitution r = (i- ^/T s )/^/^ /2 as in (26.89). 

Since the conditional probability of error does not depend on the message, it follows 
that all conditional probabilities of error are equal to the average probability of 
error p* (error) and 



(26.109) 



or, using the Binomial Expansion (26.91) with the substitution of — Ql t-, y jt- 
for b and of 1 for a, 




P 



(error) = V(-1)'+W K l ) -L [°° e^'HQ 




dr. 



(26.110) 

The probability of error associated with an orthogonal constellation with k signals 
is better than that of the bi-orthogonal constellation with 2k signals and equal 
average energy. But the comparison is not quite fair because the bi-orthogonal 
constellation is richer. 

26.12 Detection in Colored Noise 



Our focus throughout has been on the detection problem when the noise is "white" 
in the sense that its PSD is flat over the frequency band to which the mean sig- 
nals are limited. We now extend the discussion to "colored" noise, i.e., to noise 
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whose PSD is not constant over the bandwidth of interest. We continue to as- 
sume that the mean signals {s m } mg jvi are integrable signals that are bandlimited 
to W Hz and that the noise (N(t)) is independent of the message M and is a 
measurable, stationary, Gaussian SP. Its PSD Saw, however, is now an arbitrary 
nonnegative, symmetric, integrable function that is not necessarily constant over 
the band [— W, W]. Conditional on M = m, the received waveform [Y(t)j is given 
at time t by s m (t) + N(t). 

Our approach is based on "whitening the noise" and is only applicable when the 
noise can be whitened with respect to the bandwidth W, i.e., when there exists a 
whitening filter for the noise with respect to W: 

Definition 26.12.1 (Whitening Filter for Saw with respect to W). A filter of im- 
pulse response h: R — > M. is said to be a whitening filter for Saw (or for (N(t))) 
with respect to the bandwidth W if it is stable and its frequency response h 
satisfies 

Saw(/)IM/)| 2 = 1, l/l < W. (26.111) 

Only the magnitude of the frequency response of the whitening filter is specified in 
(26.111) and only for frequencies in the band [— W, W]. The response is unspecified 
outside this band. Consequently: 

Note 26.12.2. There may be many different whitening filters for Saw with respect 
to the bandwidth W. 

If Saw is zero at some frequencies in [— W, W], then there is no whitening filter 
for Saw with respect to W. Likewise, a whitening filter for Saw does not exist 
if Saw is not continuous in [— W, W] (because the frequency response of a stable 
filter must be continuous (Theorem 6.2.11), and if Saw is discontinuous, then so is 
/~lA/rW/)|). Thus: 

Note 26.12.3. There does not always exist a whitening filter for Saw with respect 
to W. 

We shall see, however, in Proposition 26.12.8 that a whitening filter exists whenever 
throughout the interval [— W, W] the PSD Saw is strictly positive and is twice 
continuously differentiable. 

The filter is called "whitening" because, by Theorem 25.13.2, we have: 

Proposition 26.12.4. // (N(t), t s R) is a measurable, stationary, Gaussian SP 
of PSD Saw , and if h is the impulse response of a whitening filter for Saw with 
respect to W, then (N(t)) *h is white Gaussian noise of PSD 1 with respect to the 
bandwidth W. 

Assuming that the noise can be whitened with respect to the bandwidth W, we 
pick some whitening filter of impulse response h and denote by (Y(t)) the result 
of feeding the observed SP (Y(t)) to this filter: 

(Y(t)) = (Y(i))*h. (26.112) 
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Conditional on M = m, the output of the whitening filter is given by 

Y(t) = (a m +(N(t)))*h 

= s m + N(t), tet, (26.113) 

where 

s m = s TO *h, m € M, (26.114) 

and 

(N(t)) = (N(t))*h. (26.115) 

By Proposition 6.5.2, s m is an integrable signal that is bandlimited to W Hz and 

s m (t)= § m (f)h(f)e i27rft df, teR. (26.116) 

J-w 

And, by Proposition 26.12.4, (N(t)) is white Gaussian noise of PSD 1 with respect 
to the bandwidth W. 

Loosely speaking, the main result of this section is that there is no loss in optimal- 
ity in guessing M based on the whitening filter's output (Y(t)). This is not very 
surprising for the following reason. While passing (Y(t)) through the whitening 
filter is not necessarily an invertible operation, it "almost" is, in the sense that we 
can recover the original observation inside the band [— W, W]. Since the transmit- 
ted signals are bandlimited to WHz, we do not expect that the observation outside 
this band will influence our guess. 

Once this result is proved, the detection problem is reduced to detecting known 
signals (the signals {s m }) in white Gaussian noise (the SP (N(t))). Employing 
Theorem 26.4.1, we obtain that if guessing M based on the whitening filter's output 
is optimal, then so is basing one's guess on the inner products vector 

((Y, Sl ),...,(Y,s M )) T , (26.117) 

thus reducing the continuous-time detection problem to one where the observation 
is a random vector taking value in K M . 

We next describe the sufficient statistic for our problem more carefully. Rather than 
expressing the sufficient statistic as in (26.117), we prefer to express it directly in 
terms of the observed signal (Y(t)) as the vector 

(Y,h*si),...,(Y,h*s M }) , (26.118) 

where the equivalence of the two forms can be formally derived as follows: 

/CO / /*oo \ 

(/ Y(a)h(t-a)d<j)s m (t)dt 
-co \J — CO / 

Y(o)h(t-o)3 m (t)dtd<T 






-co J —co 

CO 



Y(p)\ \ h{t - a) s m (t) dt ) da 
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/GO 
Y(a)(h*s m )(a)da 
-CO 

(Y,h*s TO ). 



Note that for each m € M. the convolution h * s m is the result of passing the 
signal s m , which is an integrable signal that is bandlimited to W Hz, through 
the stable filter of impulse response h, so h* s TO is an integrable signal that is 
bandlimited to W Hz (Proposition 6.5.2). This integrability guarantees that the 
inner products in (26.118) are well-defined (Proposition 25.10.1). 

We can now state the main result of this section: 

Theorem 26.12.5 (Detecting Known Signals in Colored Noise). Let M take value 
in the finite set M = {1, . . . , M.}, and let the signals Si, . . . , sm be integrable signals 
that are bandlimited to W Hz. Let the conditional law of (Y(t)) given M = m be 
that of s m (t) + N(t), where (N(t)) is a stationary, measurable, Gaussian SP of 
PSD Sjv/v that can be whitened with respect to the bandwidth W. Let h be the 
impulse response of a whitening filter for (N(t)). Then: 

(i) The inner-products vector (26.118) forms a sufficient statistic for guessing M 
based on the observation (Y(t)). 



(ii) Conditional on M = m, this vector is Gaussian with mean 

((s m ,§i) , . . . , (s m ,s M )) 
and M x M covariance matrix 



(26.119) 



/(si, si) (§i,S 2 ) 

(§2, §i) (S 2 ,S 2 ) 


(Sl,S M )\ 

(s 2 ,s M ) 


\(sm,Si) (s M ,s 2 ) 


••• (sm,sm)/ 


& j — o 7 "a XX - 


] eM, 



where 

and where the inner product (s m ',s m ") can also be expressed as 



(26.120) 



(26.121) 



y^m' i ^m" 



v(/)4"(/) 



i 



w 



Satat(/) 



d/, m',m"eM. (26.122) 



(in) If (4>i, ■ • ■ , 4>d') is an orthonormal d' -tuple of integrable signals that are band- 
limited to W Hz, and if 



s m e span(0i,... ,<f>d>), meM, 
then the inner products vector 

((Y,h*0 1 ),...,(Y,h*0 d ,)) T , 



(26.123) 



(26.124) 
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forms a sufficient statistic for guessing M based on (Y(t)) and, conditional 
on M = m, is a multivariate Gaussian of covariance matrix \d> and of mean 
vector 

((s m ii),..,(s m >«f» T (26.125) 

Proof. The sufficiency of the vector (26.118) can be established by first proving 
the result in the binary antipodal case, and by then generalizing the result as we 
did in the proof of Theorem 26.4.1. 

In the binary antipodal case we denote the RV to be guessed by H and assume 
that it takes value in {0,1}. We assume that, conditional on H = 0, the time-i 
received waveform is s(t) + N(t) whereas, conditional on H = 1, it is — s(t) + N(t). 
We show that for every r\ € N and any choice of the epochs t\ , . . . , t^ € M, the inner 
product (Y, h • s) forms a sufficient statistic for guessing H based on the vector 



(H*i), 



,Y(t v ), (Y^s)' 1 



As in the proof of Theorem 26.4.1, this can be established using Lemma 26.8.1 as 
follows. One first notes that, conditional on H, this vector is Gaussian (Proposi- 
tion 25.11.1). One then notes that the conditional covariance matrix of this vector 
conditional on H = is the same as conditional on H = 1 and that this covariance 
matrix can be computed using Theorem 25.12.2. Finally one shows that the vec- 
tor's conditional mean vector, conditional on H = 0, is antipodal to its conditional 
mean vector, conditional on H = 1, and that both are scaled versions of the last 
column of the conditional covariance matrix. 

Once the sufficiency of the vector (26.118) has been established, the computation 
of its conditional law is straightforward: by Proposition 25.11.1 it is conditionally 
Gaussian, and its conditional mean (26.119) and conditional covariance (26.120) 
are readily derived using Theorem 25.12.2. The derivation of (26.122) follows from 
(26.116) using the Mini Parseval Theorem (Proposition 6.2.6 (i)) and (26.111). 

An alternative way of deriving the conditional distribution is to note that the vector 
(26.118) can also be expressed as the vector (26.117) and to then use the result 
from Section 26.5.1 by substituting 1 for No/2 and s m for s m for all m € M. 

Part (iii) follows directly from Parts (i) and (ii). □ 

Since the inner products (s m <, s m >>) for m' ', m" € M. determine the conditional law 
of the sufficient statistic (see (26.119) & (26.120)), and since, by (26.122), the inner 
product (i m /, i m ") does not depend on the choice of the whitening filter we obtain: 

Note 26.12.6. Neither the conditional distribution of the sufficient statistic vec- 
tor in (26.118) nor the optimal proability of error depends on the choice of the 
whitening filter. 

Using Theorem 26.12.5 we can now derive an optimal rule for guessing M. Indeed, 
in analogy to Theorem 26.6.3 we have: 

Theorem 26.12.7. Consider the setting of Theorem 26.12.5 with M of prior {7r m }. 
The decision rule that guesses uniformly at random from among all the messages 
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m € M. for which 

1 d 2 



= max <ln7r TO - - -^((Y,h*<fo) - (s m <,h*<^)J ^ (26.126) 

minimizes the probability of error whenever h is a whitening filter and the tuple 
(0i, . . . , 4>d) forms an orthonormal basis for span(si * h, . . . , sm * h). 

Before concluding our discussion of detection in the presence of colored noise we 
derive here a sufficient condition for the existence of a whitening filter. 

Proposition 26.12.8 (Existence of a Whitening Filter). Let W > be fixed. If 
throughout the interval [— W, W] the PSD Snn is strictly positive and twice con- 
tinuously differentiable, then there exists a whitening filter for Snn with respect to 
the bandwidth W. 

Proof. The proof hinges on the following basic result from harmonic analysis 
(Katznelson, 1976, Chapter VI, Section 1, Exercise 7): if a function / t— > g(f) 
is twice continuously differentiable and is zero outside some interval [—A, A], then 
it is the FT of some integrable function. 

To prove the proposition using this result we begin by picking some A > W. We 
now define a function g: K — > M. as follows. For / > A, we define g(f) = 0. 
For / in the interval [0,W], we define g{f) = l/y / S NN {f). And for / G (W,A), 
we define g{f) so that g be twice continuously differentiable in [0,oo). We can 
thus think of g in [W, A] as an interpolation function whose values and first two 
derivatives are specified at the endpoints of the interval. Finally, for / < 0, we 
define <?(/) as g{—f)- Figure 26.10 depicts Saw, g, W, and A. 

A whitening filter for Sjvjv with respect to the bandwidth W is the integrable 
function whose FT is g and whose existence is guaranteed by the quoted result. □ 



26.13 Detecting Signals of Infinite Bandwidth 

So far we have only dealt with the detection problem when the mean signals are 
bandlimited. What if the mean signals are not bandlimited? The difficulty in this 
case is that we cannot assume that the noise PSD is constant over the bandwidth 
occupied by the mean signals, or that the noise can be whitened with respect to 
this bandwidth. 

We can address this issue in three different ways. In the first we can try to find the 
optimal detector by studying this more complicated hypothesis testing problem. It 
will no longer be the case that the inner products vector (26.15) forms a sufficient 
statistic. It will turn out that the optimal detector greatly depends on the rela- 
tionship between the rate of decay of the PSD of the noise as the frequency tends 
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Figure 26.10: The frequency response of a whitening filter for the PSD Snn with 
respect to the bandwidth W. 



to ±oo and the rate of decay of the FT of the mean signals. This approach will 
often lead to bad designs, because the structure of the receiver will depend greatly 
on how we model the noise, and inaccuracies in our modeling of the noise PSD at 
ultra-high frequencies might lead us completely astray in our design. 

A more level-headed approach that is valid if the noise PSD is "essentially flat 
over the bandwidth of interest" is to ignore the fact that the mean signals are not 
bandlimited and to base our decision on the inner products vector, even if this is 
not fully justified mathematically. This approach leads to robust designs that are 
insensitive to inaccuracies in our modeling of the noise process. If the PSD is not 
essentially flat, we can whiten it with respect to a sufficiently large band [— W, W] 
that contains most of the energy of the mean signals. 

The third approach is to use very complicated mathematical machinery involving 
the Ito Calculus (Karatzas and Shreve, 1991) to model the noise in a way that will 
result in the inner products forming a sufficient statistic. We have chosen not to 
pursue this approach because it requires modeling the noise as a process of infinite 
power, which is physically unappealing. This approach just shifts the burden of 
proof from one place to another. Indeed, the Ito Calculus can now prove for us 
that the inner products vector is sufficient, but we need a leap of faith in modeling 
the noise as a process of infinite power. 

In the future, in dealing with mean signals that are not bandlimited, we shall refer 
to the "white noise paradigm" as the paradigm under which the receiver forms its 
decision based on the inner products vector (26.15) and under which these inner 
products have the conditional law derived in Section 26.5.1. 
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26.14 A Proof of Lemma 26.8.1 

We next present a proof of Lemma 26.8.1 that is also valid when the matrix A is 
singular. We denote the Row-j Column-fc component of this matrix by A"'' '. 

Proof. We first treat the case where the variance A'™'™' of the last component Y' n ) 
of the random n-vector Y is zero. By the Covariance Inequality it follows that for 
every j G {l,...,n} 

|A W ' n) | = |Cov[yk'\y< n >]| < Y / Var[yO)]^/Var[y(™)] = y/\ti,i)y/\(",n) t 

so in this case the n-th column of A is zero. Consequently since the mean vector fi 
is by assumption proportional to the last column of A, it follows that in this case 
/x = 0. But for fi = the conditional law of Y given H = is the same as given 
H = 1, SO Y is useless for guessing H, and any measurable function of Y, and a 
fortiori its last component, forms a sufficient statistic (albeit also useless). 

We next turn to the more interesting case where 

A ( "' n) > 0. (26.127) 

In this case we can write the assumption that fi is a scaled version of the last 
column of A as 

\(j,n) 

^ )= ^ n) J^n)' J'Ml «}■ (26-128) 

We need to show that, irrespective of the prior on H, (26.128) implies that 

if-o-yW-o-Y (26.129) 

forms a Markov chain or, equivalently, that 

iJ^-y (n) ^-R (26.130) 

forms a Markov chain, where R is the random (n — l)-vector of components 

R(J) = Y(J) -^) Y{n) > je{l,...,n-l}. (26.131) 

(Conditional on Y^ n \ we have that y( n ) is deterministic, so Rv> and Y"' only 
differ by a deterministic constant.) Thus, we need to show that R is irrelevant for 
guessing H based on r"l This we prove using Proposition 22.5.5 by showing that 
y(") and R are conditionally independent given H 

yW-o-JT-o-R (26.132) 

and that H and R are independent. 

We begin by proving (26.132). We first note that, conditional on H = 0, the vector 



(flW...,^"- 1 )^")) 7 
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is Gaussian, because it is the result of linearly transforming the vector Y, which, 
conditional on H = 0, is Gaussian (Proposition 23.6.3). Also, conditional on 
H = 0, we have that Y^ n ' is uncorrelated with the components of R because 



Cov[y(™\i? 

= Cov 



(:/) 



H = 



\ (j,n) 

y(") y(j) y(™ 



A(" 

Gov |) ("»,yW ff = 

A (j>) / x 



A(™>") 



CovyW,yW 



if = 



A(-' 



A( n >") ' 



o, je{i, 



!}• 



By Corollary 23.6.9, we conclude that, conditional on H = 0, we have that Y"(") is 
independent of R. Repeating this argument for the case where the conditioning is 
on H = 1 proves (26.132). 

We next verify that R and H are independent. We do so by showing that the 
conditional distribution of R given H = is identical to its conditional distribution 
given H = 1. Since under both conditionings R is Gaussian, it suffices to show 
that the conditional covariance of R given H = is the same as given H = 1 and 
similarly for the mean. To show that the covariances are the same is easy, because 
the conditional covariance of R is determined by A, which is the same under the 
two hypotheses. As to the mean we have 



ER^ 



H = 



Y 0) 
E [y0) 



\(j, 



ix 



U) 



A(™> n 

H = 

\U, n ) 



y(") 



iJ = 



A0» 
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J» r 

— f EyW 
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o, je{i, 
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where the last equality follows from (26.128). Similarly, under H = 1 we have 



ER^ 



H = 1 



yO') 
E [yW) 



AOV 



A( 
7J = 1 



-/! 



(J) 



n,n) 



y(«) 



if = 1 



A( n >") 
0, j6{l,...,n 



aT 
(- M W) 
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— f EyW 

71, 71 ) | 



i7= 1 



thus establishing that the mean of R does not depend on H either. 

Having established that R and H are independent, it now follows from the con- 
ditional independence of Y^ n ' and R given H (26.132) that R is irrelevant for 
guessing H based on y(") (Proposition 22.5.5). □ 
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26.15 Exercises 

Exercise 26.1 (Reducing the Number of Matched Filters). We saw in Section 26.4 how to 
obtain a d-dimensional sufficient statistics vector, where d is the dimension of the linear 
subspace spanned by the mean signals (26.17). Show that, given any integrable signal so 
that is bandlimited to W Hz, we can find a d'-dimensional sufficient statistics vector, 
where 

d! — Dim(span(si — so, ... , sm — so)) . 

Show that d' is sometimes smaller than d. 

Exercise 26.2 (Nearest-Neighbor Decoding Revisited). The form of the decoder in The- 
orem 26.6.3 (ii) is different from the nearest-neighbor rule of Proposition 21.6.1 (ii). 
Why does minimizing ||Y — s m |L not make mathematical sense in the setting of The- 
orem 26.6.3? 

Exercise 26.3 (Proving Sufficiency). In Section 26.8.3 we sketched an argument for the 
sufficiency of the vector in (26.57). Fill in the details. 

Exercise 26.4 (Minimum Shift Keying). Let the signals So,Si be given at every t £ R by 

So(t) = A/^cos(27r/o£)I{0< t< T s }, si(t) = J 2 ^ 1 cos(27r/ii) I{0 < t< T s }. 

(i) Compute the energies ||so|| s , ||si|| s . You may assume that /iT s 3> 1 and /2T S 2> 1. 

(ii) Under what conditions on /o, /i, and T s are so and si orthogonal? 

(iii) Assume that the parameters are chosen as in Part (ii). Let H take on the values 
and 1 equiprobably, and assume that, conditional on H — z/, the time-i received 
waveform is s v (t) + N(t) where (iV(t)J is white Gaussian noise of double-sided 
PSD No/2 with respect to the bandwidth of interest, and v £ {0,1}. Find an 
optimal rule for guessing H based on the received waveform. 

(iv) Compute the optimal probability of error. 

Exercise 26.5 (Signaling in White Gaussian Noise). Let the RV M take value in the set 
M — {1,2,3,4} uniformly. Conditional on M = m, the observed waveform (Y(t)j is 
given at every time tglby s m (t) + N(t), where the signals si, S2, S3, S4 are given by 

Si(t) = AI{0 < t < T}, s 2 (t) = AI{0 < t < T/2} - AI{T/2 < t < T}, 

s 3 (t) = 2AI{0< t< T/2}, s 4 (t) = -AI{0< t< T/2} + AI{T/2 < t < T}, 

and where (N(t)) is white Gaussian noise of PSD No/2 over the bandwidth of interest. 
(Ignore the fact that the signals are not bandlimited.) 

(i) Derive the MAP rule for guessing M based on (Y(t)). 

(ii) Use the Union-of-Events Bound to upper bound pMAp(error|M = 3). Are all the 
terms in the bound needed? 

(iii) Compute pMAp(error|M = 3) exactly. 

(iv) Show that by subtracting a waveform s„ from each of the signals 81,82,83,84, we 
can reduce the average transmitted energy without degrading performance. What 
waveform s» should be subtracted to minimize the transmitted energy? 
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Exercise 26.6 (QPSK). Let the IID random bits D\ and D2 be mapped to the symbols 
X\ , X2 according to the rule 

(0,0) h» (1,0), (0,1) k- (-1,0), (1,0)^(0,1), (1,1)^(0,-1). 

The received waveform \Y(t)j is given by 

Y(t) = AX-, ^(t) + AX 2 4> 2 {t) + N{t), t € R, 

where A > 0, the signals (pi, cj>2 are orthonormal integrable signals that are bandlimited 
to W Hz, and the SP (N(t)) is independent of (Di, D2) and is white Gaussian noise of 
PSD No/2 with respect to the bandwidth W. 

(i) Find an optimal rule for guessing (Di,D 2 ) based on (Y(t)). 

(ii) Find an optimal rule for guessing D\ based on (Y(t)J. 

(iii) Compare the rule that you have found in Part (ii) with the rule that guesses that D\ 
is the first component of the tuple produced by the decoder that you have found 
in Part (i). Evaluate the probability of error for both rules. 

(iv) Repeat when (D\,D2) are mapped to (X\,X2) according to the rule 

(0,0) ^> (1,0), (0,1) 1- (0,1), (1,0) h-> (-1,0), (1,1)^(0,-1). 



Exercise 26.7 (Mismatched Decoding of Antipodal Signaling). Let the received wave- 
form (Y(t)) be given at every t £ R by (1 — 2H) s(t) + N (t) , where s is an integrable signal 
that is bandlimited to W Hz, (JV(t)) is white Gaussian noise of PSD No/2 with respect 
to the bandwidth W, and H takes on the values and 1 equiprobably and independently 
of [N{t)j. Let s' be an integrable signal that is bandlimited to W Hz. A suboptimal 
detector feeds the received waveform to a matched filter for s' and guesses according to 
the filter's time-0 output: if it is positive, it guesses "H — 0," and if it is negative, it 
guesses U H — 1." Express this detector's probability of error in terms of s, s', and N . 

Exercise 26.8 (Imperfect Automatic Gain Control). Let the received signal {Y(t)j be 
given by 

Y(t) = AXs(t) + N(t), (61, 

where A > is some deterministic positive constant, X is a RV that takes value in 
the set { — 3, — 1, +1, +3} uniformly, s is an integrable signal that is bandlimited to W 
Hz, and (AT(t)J is white Gaussian noise of double-sided PSD No/2 with respect to the 
bandwidth W. 

(i) Find an optimal rule for guessing X based on [Y (£)) . 
(ii) Using the Q-function compute the optimal probability of error. 
(iii) Suppose you use the rule you have found in Part (i), but the received signal is 

Y(t) = -AXs(t)+N(t), (el. 

(You were misinformed about the amplitude of the signal.) What is the probability 
of error now? 
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Exercise 26.9 (Positive Semidefinite Matrices). 

(i) Let si,...,Sm be of finite energy. Show that the M x M matrix whose Row-j 
Column-<? entry is (sj,se) is positive semidefinite. 

(ii) Show that any M. x M positive semidefinite matrix can be expressed in this form 
with a proper choice of the signals si , . . . , Sm ■ 

Exercise 26.10 (A Lower Bound on the Minimum Distance). Let si,...,Sm be equi- 
energy signals of energy E s . Let 



<f A 1 V V lis ,-s , 

~ - ■ 'vi - l) 2-i 2—< " 



denote the average squared-distance between the signals, 
(i) Justify the following bound on d: 

M M 

72 _ J- V~~* V~~* II _ ||2 

~ M(M-l) ^ ^ l|s m '-s m »|| 2 

v ' m'=l m"=l 

2M 2M 1 ^ ^ 

~M-1 s M-1M 2 ^ ^ (Sm ' ,Sm "' 

m' = l m" = 1 



2M _ 2M 

s„ 



I M 



M 



M-l Es M-l 

<-™U.. 

-M-l 

(ii) Show that if, in addition, (s m /,s m //) = pE s for all m 7^ m" in {1, . . . , M}, then 

<P< I- 



M- 1 
(iii) Are equalities possible in the above bounds? 

Exercise 26.11 (Generalizations of the Simplex). Let p* (error; E s ;p; M; N ) denote the 
optimal probability of error for the setup of Section 26.2 for the case where the prior 
on M is uniform and where 

I Es if m' — m" , in, 

(V,V/= _ . . m ,m €{1,...,M}. 

Ipb s otherwise, 

Show that 

p* (error; E s ;p;M; No) = p* (error; E s (l - p); 0; M; No) , - _ < p < 1. 

Hint: You may need a different proof depending on the sign of p. 

Exercise 26.12 (Decoding the Simplex without Gain Control). Let the simplex constel- 
lation si,...,sm be constructed from the orthonormal signals (pi, . . . ,</>m as in Sec- 
tion 26.11.4. In that section we proposed to decode by adding 

— ^ — VT s ip 

y/M- 1 
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to the received signal Y and then feeding the result to a decoder that was designed for 
the orthogonal signals 

Here ip is any signal that is orthogonal to the signals {si , . . . , sm}. Show that feeding the 
signal Y + atp to the above orthogonal-keying decoder also results in an optimal decoding 
rule, irrespective of the value of a £ R. 

Exercise 26.13 (Pretending the Noise Is White). Let H take on the values and 1 
equiprobably, and let the received waveform \Y{t)j be given at time t by 

Y(t) = (l-2H)s(t) + N{t), 

where s: t i— > I{0 < t < 1}, and where the SP (N(t)j is independent of H and is a 
measureable, centered, stationary, Gaussian SP of autocovariance function 

Kjvjv(t)= -U- |T|/Q , tgK, 
4a 

where < a < oo is some deterministic real parameter. Compute the probability of error 
of a detector that guesses l 'H = 0" whenever 

Y(t) At > 0. 
o 

To what does this probability of error converge when a tends to zero? 

Exercise 26.14 (Antipodal Signaling in Colored Noise). Let s be an integrable signal that 
is bandlimited to W Hz, and let H take on the values and 1 equiprobably. Let the time-i 
value of the received signal {Y(t)) be given by (1 — 2H) s(t) + N(t), where (N(t)'j is a 
measurable, centered, stationary, Gaussian SP of autocovariance function Kjvat. Assume 
that H and (iV(t)j are independent, and that Kjvjv can be whitened with respect to the 
bandwidth W. Find the optimal probability of error in guessing H based on (Y (i)) . 

Exercise 26.15 (Modeling Artifacts). Let H take on the values and 1 equiprobably, and 
let the received signal (Y(t)) be given by 

Y(t) = (l-2H)s(t) + N(t), teR, 

where s: t i— » I{0 < t < 1} and the SP (^(i)) is independent of H and is a measurable, 
centered, stationary, Gaussian SP of autocovariance function 

Kiviv(r) -a^'Vel, 

for some a, (3 > 0. 

Argue heuristically that — irrespective of the values of a and /3 — for any e > we can find 
a rule for guessing H based on (Y(t)) whose probability of error is smaller than e. 

Hint: Study s(f) and Saw(/) at high frequencies f. 
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Exercise 26.16 (Measurability in Theorem 26.3.2). 

(i) Let (iV(t)J be white Gaussian noise of double-sided PSD No/2 with respect to the 
bandwidth W. Let R be a unit-mean exponential RV that is independent of (N(t)). 
Define the SP 

N(t) = N(t)l{t^R}, teR. 

Show that (N(t)') is white Gaussian noise of double-sided PSD No/2 with respect 
to the bandwidth W. 

(ii) Let s be a nonzero integrable signal that is bandlimited to W Hz. To be concrete, 

s(t) = sine 2 (Wt), (Gl. 

Suppose that the SP [N(t)j is as above and that for every w £ fl the sample-path 
t t— > N(uj,t) is continuous. Construct (N(t)) as above. Suppose you wish to test 
whether you are observing s or — s in the additive noise [N(t)f. Show that you can 
guess with zero probability of error by finding an epoch where the observed SP is 
discontinuous and by comparing the value of the received signal at that epoch to 
the value of s. (This does not violate Theorem 26.3.2 because this decision rule is 
not measurable with respect to the Borel a-algebra generated by the observed SP.) 



Chapter 27 

Noncoherent Detection and Nuisance 
Parameters 



27.1 Introduction and Motivation 

In this chapter we discuss a problem that arises in noncoherent detection. To mo- 
tivate the problem, consider a setup where a transmitter sends one of two different 
passband waveforms 



i^2Re(s ,BB(i)e i2 ^') or i h-> 2Re(ai, BB (*) 



e 



where So,bb and s^bb are integrable baseband signals that are bandlimited to W/2 
Hz, and where the carrier frequency f c satisfies f c > W/2. To motivate our problem 
it is instructive to consider the case where 

fc » W. (27.1) 

(In wireless communications it is common for f c to be three orders of magnitude 
larger than W.) Let X(t) denote the transmitted waveform at time t. Suppose 
that the received waveform (Y(t)) is a delayed version of the transmitted waveform 
corrupted by white Gaussian noise of PSD No/2 with respect to the bandwidth W 
around the carrier frequency f c (Definition 25.15.3): 

Y(t)=X(t-t )+N(t), teR, 

where irj denotes the delay (typically proportional to the distance between the 
transmitter and the receiver) and (N(t)) is the additive noise. Suppose further 
that the receiver estimates the delay to be t'jy and moves its clock back by defining 

t' = t-t' D . (27.2) 

If Y(t') is what the receiver receives when its clock shows t' ', then by (27.2) 

Y(t') = Y(t' + t' D ) 

= X(t' + t' D -t D ) + N(t' + t' D ) 

= X{t / + t / B -t u ) + N{t r ), f'Gl, 

613 
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where N(t') = N(t' + t' ) and is thus, by the stationarity of (N(t)), also white 
Gaussian noise of PSD No/2 with respect to the bandwidth W around f c . The 
term X(t' + £q — to) can be more explicitly written for every t' £ R as 

X( t ' + t' D - t D ) = 2Re(v BB (t' + t' D - * D ) e i2 -/c(t'+^-t D )^ ^ (27 3) 

where ^ is either zero or one, depending on which waveform is sent. 
We next argue that if 

|*d-*d|<^, (27.4) 

then 

*i/,BB(t / + * / D-tD)w« V ,BB(t / ). t,eR - ( 27 - 5 ) 

This can be seen by considering a Taylor Series expansion for s„.bb(0 around t' 



Si/,BB(* + *d ~ *d) ~ s^,bb(* ) + 



dr 



(*TJ " *d) 



and by then using Bernstein's Inequality (Theorem 6.7.1) to heuristically argue that 
the derivative of the baseband signal is of order of magnitude W, so its product by 
the timing error is, by (27.4), negligible. 

From (27.3) and (27.5) we obtain that, as long as (27.4) holds, 

X (t' + i' D - t D ) w 2Re(«„,BB(t / ) e i2 ^('' + ^- tD )" 

= 2RefvBB(i')e i(27r/ct ' +e) ), t'el, (27.6a) 



where 

6» = 27r/ c (t' D - i D ) mod [-tt, tt). (27.6b) 

(Recall that ^ mod [—it, tt) is the element in the interval [— n, tt) that differs from £ 
by an integer multiple of 2ir.) Note that even if (27.4) holds, the term 27r/ c (tp — to) 
may be much larger than 1 when / c ^> W. 

We conclude that if the error in estimating the delay is negligible compared to the 
reciprocal of the signal bandwidth but significantly larger than the reciprocal of 
the carrier frequency, then the received waveform can be modeled as 

Y(t') = 2 Re( S „,B B (0 e^ 2nf - t,+e A + N(t'), t' e R, (27.7) 

where the receiver needs to determine whether v is equal to zero or one; (N(t')) 
is additive white Gaussian noise of PSD N /2 with respect to the bandwidth W 
around / c ; and where the phase 8 is unknown to the receiver. Since the phase is 
unknown to the receiver, the detection is said to be noncoherent. In the statistics 
literature an unknown parameter such as 9 is called a nuisance parameter. 

It would make engineering sense to ask for a decision rule for guessing v based 
on (Y(t')) that would work well irrespective of the value of 8, but this is not the 
question we shall ask. This question is related to "composite hypothesis testing," 
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which is not treated in this book. 1 Instead we shall adopt a probabilistic approach. 
We shall assume that 9 is a random variable — and therefore henceforth denote it 
by and its realization by 6 — that is uniformly distributed over the interval [— 7r, n) 
independently of the noise and the message, and we shall seek a decision rule that 
has the smallest average probability of error. Thus, if we denote the probability 
of error conditional on = 9 by p(error|(9), then we seek a decision rule based 
on [Y(t)j that minimizes 

i r 

— / p(error|6>)d6i. (27.8) 

The conservative reader may prefer to minimize the probability of error on the 
"worst case 9" 

sup p(error|#) (27.9) 

0e[-7T,7r) 

but, miraculously, it will turn out that the decoder we shall derive to minimize (27.8) 
has a conditional probability of error p(error|(9) that does not depend on the real- 
ization 9 so, as we shall see in Section 27.7, our decoder also minimizes (27.9). 

27.2 The Setup 

We next define our hypothesis testing problem. We denote time by t and the 
received waveform by (Y(t)j (even though in the scenario we described in Sec- 
tion 27.1 these correspond to t' and (Y(£')), i.e., to the time coordinate and to the 
corresponding signal at the receiver). We denote the RV we wish to guess by H 
and assume a uniform prior: 

Pr[H = 0] = Pr[H = 1] = -. (27.10) 

For each v e {0, 1} the observation (Y(t)j is, conditionally on H = v, a SP of the 
form 

Y(t) = S v (t) + N(t), teK, (27.11) 

where \N(ty\ is white Gaussian noise of positive PSD No/2 with respect to the 
bandwidth W around the carrier frequency f c (Definition 25.15.3), and where S v (t) 
can be described as 

^(i) = 2Re( V BB(*)e i(2 " /c ' +e) ' 

= 2Re(s J , iBB (i) e i27r/ct ) cos© - 2lm(s^ BB {t) e' 27 *^) sin 6 

= 2Re(s„3 B (i) e i2 ^*) cos 9 + 2Re(i s^ BB {t) e i27r/ct ) sin 6 

= s„, c (i)cos 9 + s VlS (t) sin9, teR, (27.12) 

where is a RV that is uniformly distributed over the interval [— w, n) indepen- 
dently of (7J, (N(t))), and where we define for v € {0, 1} 

s^ c (t)=2Re(s V: BB(t)e' 27Tf " t ), teR, (27.13a) 

s !/ , s (i)=2Re(is^BB(i)e i2 ^*), teR. (27.13b) 



1 See, for example, (Lehmann and Romano, 2005, Chapter 3). 
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Notice that by (27.13) and by the relationship between inner products in baseband 
and passband (Theorem 7.6.10), 

(s„ iC , Si ,, s )=0, i/ = 0,l. (27.14) 

We assume that the baseband signals so^bb^s^bb are integrable complex signals 
that are bandlimited to W/2 Hz and that they are orthogonal: 

(s ,bb,Si,bb)=0. (27.15) 

Consequently, by (27.13) and Theorem 7.6.10, 

(so,c,Si, c ) = (So,s,Sl )C ) = (So,c,Sl, s ) = (so, s ,Si iS ) = 0. (27.16) 



|S0,BB||g = ||si,BB||g>0. (27.17) 



E s = 2||s ,bb||1 (27.18) 



We finally assume that the baseband signals s .bb and s^bb are of equal positive 
energy: 

■■ — ■',., -.: = ,.„, 

Defining 2 

v _ -> II i |je 

we have by the relationship between energy in baseband and passband (Theo- 
rem 7.6.10) 

E s = HSolll = ||Si|£ = ||s ,.||* = ||3o,o|ll = IIsl.HI = ||s liC || 2 g . (27.19) 

By (27.14), (27.16), and (27.18) 

(so,c)So,s)Si )C ,Si )S ) is an orthonormal 4-tuple. (27.20) 



VE S 
Our problem is to guess H based on the observation (Y(t)). 

27.3 A Sufficient Statistic 

To derive an optimal guessing rule, we begin by deriving a sufficient statistic vector. 
This vector takes value in R 4 and enables us to simplify the guessing problem from 
one where the observation consists of a SP to one where it consists of a random 
4- vector. We shall later find an even more concise sufficient statistic vector with 
only two components. We denote the sufficient statistic vector by T and its four 
components by T 0lC , T 0>a , T\ tC , and Ti ;S : 

T = (2o )C ,To ;S ,Ti jC ,Ti jS ) • 
We denote its realization by t with corresponding components 

t = (to,ci *0,s, tl,C! ^l,s) • 



The "s" in E s stands for "signal," whereas the "s" in sq, s and si. s stands for "sine. : 
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The vector T is defined by 



Y,%Y (Y,%Y (y,%L\, (y,%) I (27.21) 



y(t)s 0>c (t)dt,..., / r(t)*i iB (t)dt) . (27.22) 



We now prove that it forms a sufficient statistic for guessing H based on the 
observation (Y(t)). It is interesting to note that this sufficiency also holds for 
of arbitrary distribution (not necessarily uniform) provided that the pair (H, 0) is 
independent of the additive noise. Moreover, it holds even if the baseband signals 
s o.bb an d s i,bb are n °t orthogonal. 

Before proving the sufficiency of T we give a plausibility argument. To that end we 
consider a new (hypothetical) scenario where 0, rather than being uniform, now 
takes value in a finite set {6i, . . . ,9 K } according to some arbitrary distribution. 
Suppose further that rather than just being interested in H we also wish to guess 
the value of 0. Thus, rather than just guessing H we wish to guess the pair (H, 0) , 
which takes value in the set 

{(OA), (i A), (o, 2 ), (i, 2 ), . . . , (o, K ), (i, K )}. 

In this new scenario we have for every v G {0, 1} and every r\ G {1, . . . , n} that, 
conditional on (H, 0) = (i/,0 v ), the observation (Y(t)) consists of the signal 
t i—* s U:C (t) cos 0^ + Su tS (t) sin 9 V corrupted by additive Gaussian noise (JV(i)j . Since 
for every such v and r\ the signal t t— > s u ^ c (t) cos 8^ + s v , s (t) sin 6^ can be written 
as a linear combination of the signals So !C , $o,s, Si jC , and Si iS , it follows from The- 
orem 26.4.1 that in this new scenario T forms a sufficient statistic for guessing 
the pair (H,Q) based on (Y(t)\ But what if we are only interested in guess- 
ing HI Guessing H in this scenario reduces to guessing whether the pair (i7, 0) 
is in the set {(0, 6»i), (0, 2 ), • . • , (0, 6 K )} or in the set {(1, 6>i), (1, 2 ), • • • , (1, Ok)}- 
Consequently, by Proposition 22.4.4, in the new scenario T is also sufficient for 
guessing H. Since k in this argument can be as large as we want, it is plausible 
that T is also a sufficient statistic for guessing H in our original problem where 
is uniform over [— 7r,7r). 

The key to the above heuristic argument is that, irrespective of the realization of 
and of the value of v, the signal S^ lies in the four dimensional subspace spanned 
by the signals So, C ; s o.s, Si. c , and Si ]S . The sufficiency thus follows from a more 
general theorem that we state next. 

Theorem 27.3.1 (White Gaussian Noise with Nuisance Parameters). Let V be a 

d- dimensional subspace of the set of all integrable signals that are bandlimited to W 
Hz, and let (<pi, . . . , <pd) be an orthonormal basis for V. Let the RV M take value 
in a finite set Ai. Suppose that, conditional on M = m, the SP (Y(t)) is given by 



Y{t) = Y J A {i) Ut) + N{t), (27.23) 
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where A = (A < - 1 \...,A ( - d ') J is a random d-vector whose law typically depends 
on m, where the SP (N(t)) is white Gaussian noise with respect to the band- 
width W, and where yN(t)) is independent of the pair (M, A). Then the vector 

T=((Y,0 1 >,...,(Y,0 d )) T (27.24) 

forms a sufficient statistic for guessing M based on (Y(t)). 

The theorem also holds in passband, i.e., ifV is a d-dimensional subspace of the set 
of all integrable signals that are bandlimited to WHz around the carrier frequency f c 
and if (N(t)) is white with respect to the bandwidth W around f c . 

Note 27.3.2. Theorem 27.3.1 continues to hold even if (<j>i , . . . , <pd) are not or- 
thonormal; it suffices that they form a basis for V. 

Proof of Note 27.3.2. This follows from Proposition 22.4.2 and from the obser- 
vation that if (ui, . . . ,Ud) forms a basis for V and if (vi, . . . , Vd) forms another 
basis for V, then the inner products {(Y,V£)]-£ =1 are computable from the inner 
products {(Y,u £ )}^ =1 (Lemma 25.10.3). □ 

Before presenting the proof of Theorem 27.3.1 we give two examples of its ap- 
plication. The first is a simple case where, conditional on M, the vector A is 
deterministic. This corresponds to the problem of detecting a known signal cor- 
rupted by additive white Gaussian noise. This case was treated in Theorem 26.4.1 
and slightly generalized in Corollary 26.4.2. We thus see that Theorem 27.3.1 is a 
generalization of Theorem 26.4.1 & Corollary 26.4.2. 3 

The second example of the application of this theorem is for the noncoherent de- 
tection problem at hand. Here d = A and 

V = span(s , c , s 0)S , Si )C , s M ), (27.25) 

with 0i = s , c /\/E^, 4> 2 = s , B /\/E^ 03 - s^c/VEs, and 4> A = Si tS /y/E^. We note 
that, conditional on H = 0, the received waveform (Y(t)) can be written in the 
form (27.23) where A^ 3 ' & A^ 4 ' are deterministically zero and the pair (^l'- 1 -',^' 2 ') 
is uniformly distributed over the unit circle: 

(^D) 2 + (,4( 2 )) 2 = 1. 

Similarly, conditional on H = 1, the random variables A^ 1 ' and A^ 2 ' are determin- 
istically zero and the pair (A^ 3 \A^ 4 n is uniformly distributed over the unit circle. 
Thus, once we prove Theorem 27.3.1, it will follow that the vector in (27.22) forms 
a sufficient statistic. 

Proof of Theorem 27.3.1. To derive the sufficiency of T we need to show that for 
every r\ € N and any choice of the epochs ti, . . . , t^ 6K the random vector T forms 



3 The setup of Corollary 26.4.2 may appear slightly more general than our setting because 
the signals §i, . . . , s n are not assumed to be orthonormal. But, using the linearity of the inner 
product (Lemma 25.10.3), it is readily seen that from the inner products (27.24) one can compute 
the inner products {(Y,Sj)}" =1 and vice versa. 
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a sufficient statistic for guessing M based on (Y(ti), . . . ,Y(t v ),T). That is, we 
need to show that, irrespective of the prior distribution of M, 

M-o-T-o-(Y(t 1 ),...,Y(t v )). (27.26) 

Define the random variables 



Y(t K ) == Y(t K ) - ]T Mt K ) (Y, <j> £ ) (27.27) 

i=\ 
d 
= Y{t K )-Y J Mt.)T {l \ K=l,...,ri (27.28) 

e=i 

and stack them in a vector Y = (Y(ti), . . . , Y(t v )) T . Since, conditional on T, the 
random variables Y(t K ) and Y(t K ) only differ by a constant (which depends on T), 
it follows that to prove (27.26) it suffices to prove 

M^-T^-Y. (27.29) 

Instead of proving (27.29), we shall prove 

(M,A)-o-T-o-Y, (27.30) 

which implies (27.29). (If the pair (X, Y) is independent of Z , then X is indepen- 
dent of Z. Likewise if we condition on T: if conditional on T the pair (X, Y) is 
independent of Z , then conditional on T we also have that X is independent of Z.) 

By Proposition 22.5.5 it follows that to establish (27.30) it suffices to show that 

Y is independent of (M, A) (27.31) 

and 

T-o-(M,A)-o-Y. (27.32) 

We first prove (27.31) by showing that conditional on (M, A) = (m,a) the random 
vector Y is Gaussian with a mean vector and a covariance matrix that do not 
depend on m and a. That conditional on (M, A) = (m, a) the random vector Y is 
Gaussian follows because under this conditioning T and Y(ti), . . . Y(trj) are jointly 
Gaussian (Theorem 25.12.1) so the result of linearly transforming them to form Y 
must also be Gaussian (Proposition 23.6.3). For the mean we have from (27.27) 



E[Y(t K )\(M,A) = (m,&) 



Y(t K )-Y,Mtn)(Y 7( f> e 



l=i 



(M,A) = (m,a) 

d 
E[Y(t K ) | (M, A) = (m, a)] - ]T &(*„) E[(Y, ^) | (M, A) = (m, a)] 

d did 



t=\ 
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W 



1=1 t=\ 

= 0, k G {l,...,r/}, 

where the first equality follows from the definition of Y(t K ); the second from the 
linearity of conditional expectation; the third because [N{t)j is of zero mean; and 
the fourth from the orthonormality of (<f>i, . . . , 4>d)- We thus conclude that for 
every m G M. and every a G WL d , 

E[Y J (M, A) = (m, a)] = 0. (27.33) 

Likewise, the conditional covariance matrix of Y given (M, A) = (m, a) does not 
depend on the value of m and a: it is the covariance matrix of (N(ti), . . . , N(t v )) T . 
By establishing that, conditional on (M, A) = (in, a), the vector Y has a multivari- 
ate Gaussian distribution whose mean vector and covariance matrix do not depend 
on (ra,a) we have established (27.31). 

We next prove (27.32). By Theorem 25.12.1, we have that, conditional on (M, A), 
the random vectors T and Y are jointly Gaussian. To establish that they are 
conditionally independent given (M, A) it thus suffices to establish that they are 
conditionally uncorrelated (Proposition 23.7.3). We now proceed to compute their 
conditional covariance and show that it is zero. Since the conditional mean of Y 
is zero (27.33), it follows that we need to show that 



rW - E[TM I (M, A) = (m,a)])y(t K ) 



(M, A) = (m,a) 



0, 



meM, &em. d , £e {l,...,d}, ne {l,...,r?}. (27.34) 

Before embarking on this calculation, we make two preliminary algebraic ma- 
nipulations. The first entails using (27.23), (27.24), and the orthonormality of 
(01, . . . , 4>d) to express T^> as 

TW=AW + (N,4> t ), e = l,...,d. (27.35) 

This representation makes it clear that 

T W_ E [TW|(M,A) = (m,a)] =(N,0,), t=l,...,d. (27.36) 

The second manipulation involves rewriting Y(t K ) using (27.23) and (27.27) as: 

d 

Y(t K ) = Y(t K )-Y,^'(^)T {£,) 
u=\ 

d d 

= ]T AWfoit*) + N(t K ) - J2 <MU T {£,) 

l' = l £' = 1 

= iV(U-E(T^-^'))<MU 

€'=1 

d 

= iV(i K )-$]<N,<M<MU, KG {!,...,»?}, (27.37) 
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where the first equality follows from the definition of Y(t K ) (27.27); the second 
from (27.23); the third by rearranging terms; and the final equality from (27.35). 

It follows from (27.36) and (27.37) that to establish (27.34) it suffices to show that 
for every £ € { 1 , . . . , d} and c£ { 1 , . . . , rf] 



<N, 0,) ( iV(t K ) - ^ (N, <M <MU 



0. 



This follows from Proposition 25.15.2 and the orthonormality of (<pi, . 

d 

L 

£' = 1 



(N,<p e ) (N(t K ) - J2 <N,0/'> M** 

^ e>=i 

d 

= E[(N, fa) N(t K )] - J2 <MUE[(N, 4>l) (N, <t>e 



i'=\ 



N r0/(*«)-Z)^'(*i.)^i{^ = n 



2 

No 
2 
0. 



Mt 



N, 



Wit) 



^(*k) 



(27.38) 
Pd): 



Combining (27.38) with (27.36) and (27.37) establishes (27.34), i.e., that for every 
m e M, a e K d , £ € {l,...,d}, and k e {1, . . . ,-q} 



Cov[rW,f(t K ) 



(M,A) = (to, a) =0 



(27.39) 



This combines with the conditional joint Gaussianity of vectors T and Y given 
(M,A) to establish (27.32). The combination of (27.32) and (27.31) implies 
(27.30), which implies (27.29). Since (27.29) is equivalent to (27.26), this estab- 
lishes the theorem for baseband signals. 

For passband signals the proof is almost identical except that in deriving (27.38) 
we use Note 25.15.4 instead of Proposition 25.15.2. □ 
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Having established in the previous section that the vector T defined in (27.21) 
forms a sufficient statistic for guessing H based on (Y(t)\ we next proceed to 
calculate its conditional distribution given H. This will allow us to compute the 
likelihood-ratio /T|ff=o(t)//T|ff=i(t) and to thus obtain an optimal guessing rule. 

Rather than computing the conditional distribution directly, we begin with the 
simpler conditional distribution of T given (H,Q). Conditional on (H, 0), the 
vector T is Gaussian (Theorem 25.12.1). Consequently, to compute its conditional 
distribution we only need to compute its conditional mean vector and covariance 
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matrix, which we proceed to do. Conditional on (H,Q) = (is, 6), the observed 
process (Y(t)) can be expressed as 

Y(t) = s v<c (t) cos 9 + s VtS (t) sin9 + N(t), teR. (27.40) 

Hence, since (N(t)) is of zero mean, we have from (27.22) and (27.20) 

E[T J (H, 6) = (0, &)] = V^fcos 0, sin 0, 0, o) , (27.41a) 

E[T | (if, 6) = (1,6)] = V^(o,O,cos0,sin0) , (27.41b) 

as we next calculate. The calculation is a bit tedious because we need to compute 
the conditional mean of each of four random variables conditional on each of two 
hypotheses, thus requiring eight calculations, which are all very similar but not 
identical. We shall carry out only one calculation: 

E[T , C | (H,@) = (0,0)] = -^((s o , c cos0 + s o , s sin0,s o , c ) + E[(N,s , c )] 



— (s ,c cos 6 + s , s sin 0, s , c ) 
/ t s 

7?=(ll s o,c||gcos0+ (so !S ,s o ,c)sin0j 
E s cos 

where the first equality follows from (27.40); the second because (N(t)) is of zero 
mean (Proposition 25.10.1); the third from the linearity of the inner product and 
by writing (so, c ,so lC ) as ||so jC || 2 ; and the final equality from (27.20). 

We next compute the conditional covariance matrix of T given (H, 0) = (v, 0). By 
the orthonormality (27.20) and the whiteness of the noise (Proposition 25.15.2) we 
have that, irrespective of v and 0, this conditional covariance matrix is given by 
the 4x4 matrix (No/2)l4, where U is the 4x4 identity matrix. 

Using the explicit form of the Gaussian distribution (19.6) and defining 

a 2 4 ^, (27.42) 

we can thus write the conditional density as 
/T|H=o,e=e(t) 



1 



(2ttct 2 ) 2 "\ 2a 

1 ( E s /,, - / 

■ exp 



^2 ((*o,c - ^^0)' + (to, B - v / ^sin0) 2 + t\ c + t 2 hs U 



(2na 2 ) 2 l V 2a 2 2 



xexpl^ 



-^v / ^*o,ccos0+— VE^to, s sin0), t£l 4 , (27.43) 
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where the second equality follows by opening the squares, by using the identity 
cos 2 6 + sin 0=1, and by defining 

T 2 4- T 2 t 2 4- t 2 

rp A I 0,C T ■'O.B , A l '0,C ' t 0,S (niAA \ 

T = j ' <0 = 2 ' (27.44a) 

T 2 4-T 2 t 2 +t 2 

Ti a J ^ + /M tl A ti.c + *!,„ _ (2744b) 

(We define To and T\ not only to simplify the typesetting but also for ulterior 
motives that have to do with the further reduction of the sufficient statistic from 
a random vector of four components to one with only two, namely, the vector 
(To, TO 1 ".) 

To derive /T|jf=o(t) (unconditioned on 0) we can integrate out 0. Thus, for every 

t = (to,c)to,s!*i,c)ti,s) in M 



/7T 

= [" fe(9)f T]H=Qi6=0 (t)dO 

J — 7T 

1 Z" 1 
^— / /T|f*=o,e=e(t)d0 



27 

1 



e -E s /(2 CT 2 ) e -t 1 /2 e -t„/2 



(27rf7 2 ) 2 

i r A 1 AT" „ i 

— - / exp ^■V E st ,cCOs6'+ - 

27T ./_„ \ cH <7 

1 



x— / exp ( -^ -\/E s £ ,c cos H ^-\/E s io^sin^ ) d# 



(27TCr 



2 '| 2 



e -E s /(2 CT 2 ) e _(t + tl )/2 



i r / /E 



x — / exp 

2tt ' 



-|v/*ocos^-tan 1 (to,s/*o,c)j ) d # 



(27TfJ 



J e -E a /(2 CT 2 ) g-fio+t^/Z 



2\2 



j /»ir-tan 1 (to,s/to,c) / /p 

-7T— tan - 1 (to iS /t tt c ) 



x— / expy-s-vtocos^ di/> 

27T .l_-._t- a „-itt. it.. \ \ V cr* 



! . e -E s /(2 CT 2 ) e -( to+tl )/2^ ^ exp f ./li^cos^) d ^ 



(2ttct 2 ) 2 2tt ./_,, * ^V a 2 

= T^W e_Es/(2CT2) e - (il+i » )/2 Io (JhVio) , (27.45) 

(27Tfj z ) z \ V a I 

where the first equality follows by averaging out 0; the second because and H 
are independent; the third because is uniform; the fourth by the explicit form of 
/t\h=o, ©=#(t) (27.43); the fifth by the trigonometric identity 



t cos 6 + (3 sin 6 = \J a 2 + (3 2 cos (0 - tan" 1 (/?/«)) I (27.46) 
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the sixth by the change of variable ip = 9 — tan -1 (£o,sAo,c); the seventh from 
the periodicity of the cosine function; and the final equality by recalling that the 
zeroth-order modified Bessel function Io(-) is defined by 

Io(0= ^- I e« cos< M</> (27.47) 



1 

2^ 


I 


71 

e icos4, d(f) 

-7T 


7T J 


[ 


e £coB *o> 



By symmetry 



7r/2 

(e ? cos * + e" ? cos *) d<P, ^eR. (27.- 



'E s 



/T ' H = l(t)= (2^ e_ e " (t ° +tl)/2l0 (vS^)' tGlR4 - (27 - 49) 

27.5 An Optimal Detector 

By (27.45) and (27.49), the likelihood-ratio is given by 



fT\H=o(tj __ \ • / ,__ , 



/T| "= l(t) IoU/^V^ 



Io( V^f\/*o , 

t <= R 4 , (27.50) 



which is computable from £o and t\. This proves that the pair (To,Xi) defined in 
(27.44) forms a sufficient statistic for guessing H based on T (Definition 20.12.2). 
Having identified (7o,Ti) as a sufficient statistic, we now proceed to derive an 
optimal decision rule using two different methods. The first method, which is 
summarized in (20.79), ignores the fact that (To,Xi) is sufficient and proceeds to 
base the decision on the likelihood-ratio of T (27.50). The second method, which 
is summarized in (20.80), bases the decision on the likelihood-ratio of the pair 
(To.Ti). 

Method 1: Since we assumed a uniform prior (27.10), an optimal decision rule 
is to guess "H = 0" whenever /T|_f/=o(t)//T|ff=i(t) > 1, which, by (27.50) is 
equivalent to 

Guess "H = 0" if I„ (y ^V*o ) > Io U ^V^J • (27.51) 

This rule can be further simplified by noting that Io(£) is (strictly) increasing in £ 
for £ > 0. (This can be verified by computing the derivative from (27.48) 

^MO = I P /2 cos U cos _ e -« cos 0) d(/) 
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and by noting that for £ > the integrand is positive for all (f> G (0,7r/2).) Conse- 
quently, the function £ i— > Io(\/£) is a l so (strictly) increasing and the guessing rule 
(27.51) is thus equivalent to the rule 



Guess "H = 0" if t > h. 



(27.52) 



In terms of the observable (Y(t)j this can be paraphrased using (27.44) and (27.22) 
as guessing U H = 0" whenever 

y(t)Re( So ,BB(t)e i27r/c *)dij +(f Y{t) Re(i *o,bb(*) e i2 " /c ') d« J 

r(i)Re( Sl! B B (i)e i2 ^*)dA + (/ y(i)Re(i SliB B(t)e i2T/ct )dtV 



Method 2: We next obtain the same result by considering the likelihood-ratio 
function of the sufficient statistic (To,Xi) 

/T ,T 1 |g=o(^0 ; ^l) 
/T o ,Ti|//=l( i 0! i l) 

We begin by arguing that, conditional on H = 0, the random variables To, T\, 
and are independent with 



/T o ,Ti,e|ff=o(*O)*l)0) - WZ f'xlx ^fxlx ^ 



2ir X2 -m 



ta.Ao 



(27.53) 



where f x i (x) denotes the density at x of the noncentral y 2 distribution with n 
degrees of freedom and noncentrality parameter A (Section 19.8.2), and where 



Ao = and Ai = — =■. 
To prove (27.53) we compute for every to, t\ G M. and 8 € [— 7r, 7r) 
/T ,T 1 ,e|H=o( i o,*i,^) = /e|H=o(^)/T ,T 1 |//=o,e=9( i o,*i) 

/T ,Ti|H=o,e=e(*o,ii) 



(27.54) 



fT \H=o,e=e\to) fT 1 \H=o,e=e(ti) 



2tt 
1 

~ 2^ 

= ^xl.JM/^^l), 

where the first equality follows from the definition of the conditional density; the 
second because is independent of H and is uniformly distributed over the interval 
[— 7r,7r); the third because, conditional on (H, 0) = (0,0), the random variables 
T),cj T),s, Tic, Ti. s are independent (Section 27.4), and because To is a function 
of (T) ]C ,To iS ) whereas T. is a function of (Ti )C ,Ti )S ) (see (27.44)); and the final 
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equality follows because, conditional on (H,Q) = (0,0), the random variables 
To,c>To t a, Ti jC , Ti jS are variance-cr 2 Gaussians with means specified in (27.41a) (Sec- 
tion 19^8.2). 

Integrating out 9 in (27.53) we obtain that, conditional on H, the random variables 
To and T\ are independent with 

fT , Tl \H=o(to, h) = f xlx (t ) f x i x {h) (27.55a) 

f To , Tl \H=i{toM) = /x3.J*o) /^ >Jki (*i), (27.55b) 

where the expression for fT ,TAH=i{toi ti) is obtained using analogous steps. 

Since H has a uniform prior, an optimal decision rule is thus to guess "H = 0" 
whenever 

Since Ai > Ao, this will hold, by Proposition 19.8.3, whenever to > t\. And by the 
same proposition the inequality 

fxlJto)f xlxo (h)<f xlxo (t )f xl J tl ) 

will hold whenever to < ii- It is thus optimal to guess U H = 0" whenever to > i\ 
and to guess U H = 1" whenever to < t\. (It does not matter how we guess when 
to = t\.) The decision rule (27.52) has thus been recovered. 

27.6 The Probability of Error 

In this section we compute the probability of error for the optimal guessing rule 
(27.52). Since the probability of a tie (i.e., of To = T\) is zero both conditional on 
H = and conditional on H = 1, we shall analyze a slightly simpler guessing rule 
that guesses U H = 0" if T > Ti, and guesses U H = 1" if T\ > T . 

We begin with the conditional probability of error given that H = 0, i.e., with 
Prpi > T) \H = 0]. Conditional on H = 0, the question of whether our decoder 
errs depends prima facie not only on the realization of the additive noise (N(t)j 
but also on the realization of 0. But this is not the case because, conditionally on 
H = 0, the pair (To,Ti) is independent of (see (27.53)), so the realization of 
does not play a role in the sense that for every 9 € [— it, it) 

Pr[Ti > T | H = 0, = 9] = Pr [T x > T | H = 0, = 0] . (27.56) 

Conditional on (H, 0) = (0, 9) we have by (27.53) that T) and T\ are independent 
with T ~ x\ a an d with T\ ~ %?, x , i.e., with T\ having a mean-2 exponential 
distribution (Note 19.8.1) 

1 _H 

fT 1 \H=o,e=s(h) = ~ e 2 , h > 0. 

Consequently, for every 9 € [— tt,tt) and £ > 0, 

f 00 1 
Pr[Ti > £ | if = 0, = 6] = / - e~*/ 2 di = e~«/ 2 . (27.57) 
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Starting with (27.56) we now have for every 6 G [— 71", it) 

Pr[Ti >T |if = 0,9 = 61] 
= Pr[Ti > T | if = 0,9 = 0] 

= / /T |ff=o,e=o(*o)Pr[li > t | H = 0,9 = 0,T = t ] dt 
Jo 

/>oo 

= / /r |ff=o,e=o(*o) Pr^ > t | if = 0, 9 = 0] dt 

Jo 

/T |//=o,e=o(^o)e-* o/2 dt 



Ee 



,sT„ 



M„ 



if = 0,9 = 

0) 

^ V ' s=-l/2 



s=-l/2 



(27.58) 



where the first equality follows from (27.56); the second from (26.88); the third 
because conditional on H = (and 9 = 0) the random variables To and T\ 
are independent; the fourth from (27.57); the fifth by expressing J fz(z) 9{z) dz as 
E[g(Z)] (with g(-) the exponential function); the sixth by the definition of the MGF 
(19.23) and because, conditional on H = and 9 = 0, we have that To ~ \\ E , 2 ; 
and the final equality from the explicit expression for the MGF of a \\ e / 2 RV, 
i.e., from (19.45) with the substitution n = 2 for the number of degrees of freedom, 
A = E s /cr 2 for the noncentrality parameter, and s = —1/2. 

By symmetry we also have for every 9 G [— tt,tt) 



Pr[T >T 1 \H = l,Q = e] = 2 e ~^° 



(27.59) 



Thus, if we denote by pMAp( en " or |9 = 9) the conditional probability of error of 
our decoder conditional on 9 = 9, then by the uniformity of the prior (27.10) and 
by (27.58) & (27.59) 



PMAp(error|9 = 9) 

= p r [if = 0]p M Ap(error|if = 0,9 



Pt[H = l]p M Ap(error|ii = 1,9 = 0) 



Pr[Ti > T J if = 0,9 



Pr[T > Tj J H = 1,9 = 9} 



-TT,TT) 



(27.60) 



Integrating (27.60) over 9 yields the optimal unconditional probability of error 

1 __^L 

p* (error) = -e 4<? 2 . (27.61) 



Using (27.42), this can also be expressed as 

p (error) = - 



(27.62) 
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27.7 Discussion 

The detector we derived has the property that its error probability does not depend 
on the realization of the nuisance parameter 0; see (27.60). This property makes 
the detector robust with respect to the distribution of 0: since the conditional 
probability of error does not depend on the realization of 0, neither does the 
average performance depend on the distribution of 0. (Of course, if is not 
uniform, then our decoder need not be optimal.) 

We next show that our guessing rule is also conservative in the sense that it mini- 
mizes the worst-case performance: 

sup p(error|0 = 9). 

0e[-7T,7r) 

That is, for any guessing rule of conditional error probability p'(error|0 = 9) 

sup p'(error|0 = 9) > sup pMAp(error|0 = 9) = - e~i^2 (27.63) 

0e[-Tr,7r) 6£{-ir,Tr) * 

Thus, while other decoders may outperform our decoder for some realizations of 0, 
for other realizations their probability of error will be at least as high. Indeed, if 
p'(error|0 = 9) is the conditional probability of error associated with any guessing 
rule, then 

i r 

sup j/(error|0 = 9) > — / p'(error|0 = 9) &9 

\-ir.ir) 27T J 



96[-7T,7T) 



> 7T I PMAp(error|0 = #)d0 
= sup pMAp( erl " OT |0 = 6) d# 



where the first inequality follows because the average (over 9) can never exceed the 
supremum; the second inequality because the decoder we designed minimizes the 
unconditional probability of error; and the last two equalities follow from (27.60), 
i.e., from the fact that the conditional probability of error pMAp(error|0 = 9) of 
our decoder does not depend on 9 and is equal to the RHS of (27.60). 

It is interesting to assess the degradation in performance due to our ignorance 
of 0. To that end we now compare the performance of our detector with that 
of the "coherent detector." The coherent decoder is an optimal decoder for the 
setting where the realization of is known to the receiver, i.e., when the receiver 
can form its guess based on both \Y(ty\ and 0. If the receiver knows = 9, then it 
can compute So and Si, and the problem reduces to the problem of deciding which 
of two equi-energy orthogonal waveforms S and Si is being observed in white 
Gaussian noise (the binary version of the problem we discussed in Section 26.11.3). 
An optimal decision rule would be 



uess "H = 0" if / Y(t)S (t)dt> r(t)Si(t)di 
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with resulting probability of error (see (26.93)) 

, in m ^/"H S o-SilU/2 




_^exp(--^), %»1, (27.64) 

where the approximation follows from (19.18). Integrating over 6 we obtain 

1 / E s \ E s 



Pcohoront ( errOT ) ~ , . 2 ex P {~^2j > ^2 > L (27.65) 

Comparing (27.65) with (27.61) we see that if E s /cr 2 is large, then we pay only 
a small penalty for not knowing the phase. 4 Of course, if the phase were known 
precisely we mights have used antipodal signaling with the resulting probability of 
error being lower; see (26.72). 5 



27.8 Extension to M. > 2 Signals 

We next briefly address the M-ary version of the problem of noncoherent detec- 
tion of orthogonal signals. We now denote the RV to be guessed by M and re- 
place (27.10) with the assumption that M is uniformly distributed over the set 
M = {1, . . . , M}, where M > 2. We wish to guess the value of M based on the 
observation (Y(t)) (27.11), where v now takes value in M. and where the orthog- 
onality conditions (27.15) & (27.18) are now written as 

(s^,BB,s i ,»,BB) = ^E s I{i/ / = 1/"}, t/,v"€M. (27.66) 

We first argue that the vector 

(T!,...,T M ) T (27.67) 

forms a sufficient statistic, where, in analogy to (27.44), we define 

T 2 +T 2 

T v = t, , v e M, 

a 1 
and where 

T„, c = (Y,i^) and T Vfi = (y, ^), v e M. 

To this end, we first note that it is enough that we show pairwise sufficiency 
(Proposition 22.3.2). Pairwise sufficiency can be proved using Proposition 22.4.2 



4 Although P* (error) /p* ollerent (error) tends to infinity, it does so only subexponentially. 
5 Comparing (26.93) and (26.72) we see that, to achieve the same probability of error, binary 
orthogonal keying requires twice as much energy as antipodal signaling. 
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because for every m! 7^ m" in M. our analysis of the binary problem shows that 
the tuple (T m i ,T m ii) forms a sufficient statistic for testing between m! and m" , and 
this tuple is computable from the vector in (27.67). 

Our analysis of the binary case shows that, after observing (Y(t)), the a posteriori 
probability of the event M = m is larger than the a posteriori distribution of the 
event M = ml whenever T m > T m i. Consequently Message m has the highest a 
posteriori probability if T m = max m < e jvi T m >. Thus, the decision rule 



Guess "M = m" if T m = max T„ 



(27.68) 



is optimal. The probability of a tie is zero, so it does not matter how ties are 
resolved. 

We next turn to the analysis of the probability of error. We shall assume that 
a tie results in an error, so, conditional on M = m, an error occurs whenever 
max{Ti, . . . ,T m _i,T TO+ i, . . . ,Tjvi} > T m . We first show that, as in the binary 
case, the probability of error associated with this guessing rule depends neither on 
the realization of nor on the message, i.e., that for every m € A4 and 9 G [— 71", n) 

Pmap (error |M = m, 6 = 9) = p M Ap(error|M =1,9 = 0). (27.69) 

To see this note that, conditional on (M, 9) = (m, 9), the components of the vec- 
tor (27.67) are independent, with the m-th component being \\ e /<t 2 anc ^ w ith the 
other components being xi.o- Consequently, irrespective of 9 and m, the condi- 
tional probability of error is the probability that a x^ E , 2 RV is exceeded by, or 
is equal to, at least one of M. — 1 IID X2 random variables that are independent 
of it. In the analysis of the probability of error we shall thus assume that M = 1 
and that 9 = 0. 

The probability that the maximum among the random variables T 2 , . . . , Tm exceeds 
or is equal to £ is given for every £ > by 

Pr[max{T 2 ,...,T M } >£|M = 1,9 = 0] 

= l-Pr[max{T 2 ,...,T M } <£|M=1,6 = 0] 
= 1 - Pr [T 2 < £, . . . , T M < f | M = 1 , 9 = 0] 
= 1- (Pr[T 2 <^|M=l,9 = 0]) M_1 



M-l 



(1-e 



j=o \ 3 J 

where the first equality follows because the probabilities of an event and of its 
complement sum to one; the second because the maximum is smaller than £ if, 
and only if, all the random variables are smaller than £; the third because, con- 
ditionally on M = 1 and 9 = 0, the random variables T 2 , . . . , Tm are IID; the 
fourth because conditional on M = 1 and 9 = 0, the RV T 2 is a mean-2 exponen- 
tial (Note 19.8.1); and the final equality follows from the binomial formula (26.91) 
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with the substitution a = 1, b = — e £' 2 , and n = M — 1. The probability of error 
is thus: 

Pr[max{T 2 ,...,T M } >T l \M = l,Q = 6] 
= Pr[max{T 2 , . . . , T M } > T\ | M = 1, 6 = 0] 

/•CO 

= / /T 1 |Af=i,e=o(ti)Pr[max{T2 ) ... ) r M }>ti|M = l ) e = 0,Ti=ti]dti 

Jo 

/•OO 

= / /T 1 |M=i,e=o(*i)Pr[max{T 2) ...,r M }>ti|M = 1,6 = 0] dti 

Jo 

= y o /T 1 |M=i,e=o(*i)(i- Et- 1 ) 



i=o 



'■" "\ ' ) e~ jt ^ 2 ) dti 



M-l 



M- 1 



""\ jy Mi^Le^CtOe-^^dti 



M-l / . . 

1 -£(-!)'( lEk' 1 



M-l 



J 



M= 1,9 = 



«=-j/2 



i-ec-^C" 1 ; 1 ;^./ ,s) 



M-l 



-i/2 



v- , ,,/M-l\ 1 i e s 

1_^J(_1)J[ )- - e -J+T^. 



J=0 



i / i + 1 



where the justifications are very similar to the justifications of (27.58) except that 
we use (27.70) instead of (27.57). Denoting the probability of error by p* (error) 
and noting that for J = the summand is 1, we have 



M-l 

p*(error) = > (— 1^ + 



,/M-l\ 1 L_^ 

11 ' e J+ 1 2» 2 . 



3 J 3 + 1 



(27.71) 



or, upon recalling that a 2 was defined in (27.42) as No/2, 



M-l 



p (error) = > y (-1) J+ . )- e J+ 1 N o 



(27.72) 



27.9 Exercises 

Exercise 27.1 (The Conditional Law of the Sufficient Statistic). Conditional on M — m, 
are the components of the random vector T in Theorem 27.3.1 independent? What about 
conditional on (M, A) = (m, a) for m £ M and a e M d ? 



Exercise 27.2 (A Silly Design Criterion). Let p(error|0 = 9) denote the conditional 
probability of error given O = 8 of some decision rule for the setup of Section 27.2. Show 
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that 



inf p(error|6 = 6) > Q 

-7r<0<7r 




Can you think of a detector that achieves this bound with equality? Would you recom- 
mend using it? 

Exercise 27.3 (A Coherent Detector for an Incoherent Channel). Alice designs a coherent 
detector for the setup of Section 27.2 by pretending that Q is deterministically equal to 
zero and by then using the results on the detection of known signals in white Gaussian 
noise. Show that if her detector is used over our channel where 6 ~ Lt[[—n, 7r)) , then the 
resulting average probability of error (averaged over O) is 1/2. 

Exercise 27.4 (Noncoherent Antipodal Signaling). Show that if in the setup of Sec- 
tion 27.2 the baseband signals so,bb and si,bb — rather than orthogonal — are antipodal 
in the sense that so,bb = — si.bb, then the optimal probability of error is 1/2. 

Exercise 27.5 (A Fading Scenario). Consider the setup of Section 27.2 but with (27.11) 
replaced by Y(t) — AS v (t) + N(t), where A is a Rayleigh RV that is independent of 
(if, 6, (N(t))). Find an optimal detector and the associated probability of error when A 
is observed by the receiver. Repeat when A is unobserved. 

Exercise 27.6 (Uniform Phase Noise Is the Worst Phase Noise). Consider the setup of 
Section 27.2 but with Q not necessarily uniformly distributed over [— 7r,7r). Show that 
the optimal probability of error is upper-bounded by the optimal probability of error 
corresponding to the case where Q ~ W([— 7r, tt)). 

Exercise 27.7 (Unknown Frequency-Selective Channel). Let H take on the values and 1 
equiprobably, and let s be an integrable signal that is bandlimited to W Hz. When H = 
the transmitted signal is s, and when H = 1 it is — s. Let U take on the values {up, down} 
equiprobably and independently of H . When U = up the transmitted signal is passed 
through a stable filter of impulse response h u ; when U = down it is passed through a stable 
filter of impulse response hd. At the receiver, white Gaussian noise (iV(t)) of PSD No/2 
over the bandwidth W is added to the received signal. The noise is independent of (H, U). 
Based on the received waveform (Y(t)j, the receiver wishes to guess H. The receiver has 
no knowledge of the realization of the switch U . 

(i) Find a two-dimensional sufficient statistic vector (Ti,T2) T for this problem. 

(ii) Find a decision rule that minimizes the probability of error. Express your rule 
using the function (j>(x,y;a x ,a y ,p), which is the value at the point (x,y) of the 
joint density of the zero-mean jointly Gaussian random variables X, Y of variances 
a x and a y and covariance E[XF] — a x a y p. 

Exercise 27.8 (Noncoherent Detection with Two Antennas). Consider the setup of Sec- 
tion 27.2 but with the signal now received at two antennas. Denote the received signals 
by (Yi(i)) and (Y 2 (t)j 

Ti(i) = 2Re(«„,BB(t) e i(27r/ot+ei) ) + N^t), t e R, 

Y 2 (t) = 2Re(e„,B B (t) e i(27r/c ' +e2) ) + N 2 (t), t e R, 



where the additive white noises (JVi(t)) and (%(£)) at the two antennas are independent. 
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(i) Suppose that the random phase at the two antennas 0i and O2 are unknown but 
identical. Find an optimal detector and the optimal probability of error. 

(ii) Assume now that 0i and O2 are independent. Find an optimal guessing rule for H . 

Exercise 27.9 (Unknown Polarity). Consider the setup of Section 27.2 but with Q now 
taking on the values — 7r and equiprobably. 

(i) Find an optimal decision rule for guessing H. 

(ii) Bob suggests accounting for the random phase as follows. Pretend that the trans- 
mitted signal is drawn uniformly from the set {±so, c , ±si jC } and that it is observed 
in white Gaussian noise. Feed the received signal to an optimal receiver for guessing 
which of these four signals is being observed in white Gaussian noise, and if the 
receiver produces the guess "so, c " or "— so, c ", declare "H = 0"; otherwise declare 
"H — 1". Is Bob's receiver optimal? 

Exercise 27.10 (Additional Channel Randomness). Consider the setup of Section 27.2 
but when the observed SP [Y (i), (6l), rather than being given by (27.11), is now given 
by 

Y(t) = S„(t) + AN(t), teR, 

where A is a positive RV that is independent of (if, O, (./V(i))). Find an optimal decision 
rule when A is observed. Repeat when A is not observed. 

Exercise 27.11 (Mismatched Noncoherent Detection). Suppose that the signal fed to 
the detector of Section 27.5 is 



2Re(u BB (t)e"-^ f " t+e ^+N(t), te 



where ubb is an integrable signal that is bandlimited to W/2 Hz and that is orthogonal 
to So,bb, and where the other quantities are as defined in Section 27.2. Compute the 
probability that the detector produces the guess U H = 0." Express your answer in terms 
of the inner product {ubb,si,bb}, the energy in ubb, and No- 



Chapter 28 

Detecting PAM and QAM Signals in White 
Gaussian Noise 



28.1 Introduction and Setup 

In Chapter 26 we addressed the problem of detecting one of M bandwidth-W sig- 
nals corrupted by additive Gaussian noise that is white with respect to the band- 
width W. Except for assuming that the mean signals are integrable signals that 
are bandlimited to W Hz, we made no assumptions about their structure. In this 
chapter we study the implication of the results of Chapter 26 for Pulse Amplitude 
Modulation, where the mean signals correspond to different possible outputs of a 
PAM modulator. The conclusions we shall draw are extremely important to the 
design of receivers for systems employing PAM. 

The most important result of this chapter is that, loosely speaking, for PAM signals 
contaminated by additive white Gaussian noise, the inner products between the 
received waveform and the time shifts of the pulse shape by integer multiples of the 
baud period T s form a sufficient statistic. Thus, if we feed the received waveform to 
a matched filter that is matched to the pulse shape defining the PAM signals, then 
the matched filter's outputs sampled at integer multiples of the baud period T s 
form a sufficient statistic (Theorem 5.8.2). Using this result we can reduce the 
guessing problem from one with an observation consisting of a continuous-time 
stochastic process to one with an observation consisting of a discrete-time SP. 
In fact, since we shall only consider the problem of detecting a finite number of 
data bits, the reduction will be to a finite number of random variables. This will 
justify the canonical structure of a PAM receiver where the received continuous- 
time waveform is fed to a matched filter whose sampled output is then used by the 
decision circuitry to produce its guess. We shall derive the results first for PAM 
and then briefly describe their extension to QAM in Section 28.5. 

The setup we study is one where k data bits D\, . . . , D^ are mapped by an encoder 
cp: {0, l} fe — > K" to the real symbols X\, . . . , X n , which are then used to produce 
the transmitted waveform 



X(t) = Aj2 x e9(t-ei s ), tel, (28.1) 



(=1 
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where A > is a scaling constant; T s > is the baud period; and g(-) is the pulse 
shape, which is assumed to be a real integrable signal that is bandlimited to W 
Hz. The received waveform (Y(t)) is given by 

Y(t) = X(t) + N(t) 

n 

= Aj2x e 9{t-n s ) + N(t), teR, (28.2) 

l=\ 

where \N(ty\ is white Gaussian noise of PSD No/2 with respect to the band- 
width W and is independent of the data bits D\, . . . , Dk and hence also of (X(t)) . 
Based on the received waveform (Y(t)) we wish to guess the data bits D\, . . . , Dj~- 

To simplify the typesetting we shall stack the k data bits D\, . . . , Dk in a vector 

B=(D 1 ,...,D k ) T , (28.3) 

stack the n symbols X\, . . . , X n in a vector 

X=(X 1 ,...,X n ) T , (28.4) 

and write 

X = p(D). (28.5) 

We denote the transmitted waveform corresponding to the realization D = d by 

n 

x{t;d) = Aj2 x i9( t - £J s), *£K, (28.6) 

i=i 

where [x\, . . . , x n ) T = <p(d) is the real n-vector to which d is mapped by </?(•). 
Thus, conditional on D = d, 

Y(t)=x(t;d)+N(t), teR. (28.7) 

28.2 Sufficient Statistic and Its Conditional Law 

We can view the vector D = (D\, . . . , Dk) as a message and view the 2 different 
values it can take as the set of messages. To promote this view we define 

r> = {0,l} fc (28.8) 

to be the set of all 2 fe binary fc-tuples and view D as the set of possible messages. 
While in Chapter 21 on multi- hypothesis testing we always denoted the set of 
messages by M. and assumed that its elements are the integers 1, . . . , M, we never 
attached a meaning to the "labels" we associated with the messages. So there is no 
harm in now labeling the messages by the binary fc-tuples. Associated with every 
message d € T> is its prior 7Td 

^ d = Pr[D = d] 

= Pr[£>i = di,... ) £>fc = d fc ] ) deV. (28.9) 
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If we assume that the data bits are IID random bits (Definition 14.5.1), then 
7Td = 2 for every fc-tuple d G T>, but this assumption is inessential to our 
derivation of the sufficient statistic. (Recall that sufficiency is defined for a family 
of conditional distributions; the prior plays no role.) 

Conditional on D = d, the transmitted waveform is given by cc(-;d); see (28.6). 
Thus, the problem of guessing D is equivalent to guessing which of the 2 k signals 



|ii-» x(t;d)\ 



dev 



(28.10) 



is being observed in white Gaussian noise of PSD No/2 with respect to the band- 
width W. Prom (28.6) it follows that for every message d € T> the transmitted 
waveform t t— > x(t; d) is a (deterministic) linear combination of the n functions 
{t i— > g(t — £T S )}™ =1 . Moreover, if the pulse shape g(-) is an integrable function 
that is bandlimited to W Hz, then so is each waveform t t— > x(t; d) . Consequently, 
from Corollary 26.4.2 and from (26.23) we obtain: 

Proposition 28.2.1 (Sufficient Statistic for PAM in White Noise). Let the con- 
ditional law of (Y(t)) given D = d be given by (28.5), (28.6), and (28.7), where 
the pulse shape g is a real integrable signal that is bandlimited to W Hz, and 
where (N(t)) is white Gaussian noise of PSD No/2 with respect to the band- 
width W. Then the n inner products 



T W 



Y(t)g(t-n s )dt, £e {!,..., n} 



(28.11) 



form a sufficient statistic for guessing D based on (Y(t)). 

Moreover, conditional on D = d, the vector T = (T^ 1 ' , . . . 
n-vector whose t-th component T^> is of conditional mean 



T < - n ') T is a Gaussian 



ET W 



D 



A^x £ , R gg ((£-f)T s ), e €{!,..., n} 



(28.12) 



and whose conditional covariance matrix is 



i.e., 



No 
2 


( Rgg(0) R gg (T s ) ••• R gg ((n-1)T S )\ 
R gg (T s ) Rgg(0) ••• R gg ((n-2)T S ) 


^R gg ((n-1)T S ) Rg g ((n-2)T S ) ••• R gg (0) j 


[t^ 


) >T (0 


D = d]=^R g g((/-OT s ), e,t" €{!,.. 



(28.13) 



i}. (28.14) 



Here R gg is the self- similarity function of the real pulse shape g (Definition 11.2.1), 



and (x\ 



(p(d) is the real n-tuple to which d is encoded. 



Proof. This follows directly from Corollary 26.4.2 and from (26.23) upon substi- 
tuting the mapping t t— » g(t — £J S ) for Sj and upon computing the inner product 

g(t-e'T s )) = R gs {(£-£')J s ), £,£'eZ. D 



(t^g{t 



n B ),t 
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28.3 Consequences of Sufficiency and Other Optimality Criteria 

The sufficiency of the random vector T = (T*- 1 -*, . . . ^T^- n ') T and Theorem 26.3.2 
guarantee that if our design objective is to minimize the probability of a message 
error, then there is no loss in optimality in basing our guess on T. We shall next 
consider other design criteria and show that, for these too, there is no loss in 
optimality in basing our guess on T. 

We first elaborate on what a message error is. If we denote our guess by 

d= (di,...,dk) , 

then a message error occurs if our guess differs from the message d in at least one 
component, i.e., if di ^ di for some I € {1, . . . ,n}. The probability of a message 
error is thus 

Pr[D^D]. (28.15) 

Designing the receiver to minimize the probability of a message error is reasonable, 
for example, when the k data bits constitute a computer file, and we wish to 
minimize the probability that the file is corrupted. In such applications the user is 
often only interested in knowing whether the file was successfully received (no error 
occurred) or if the file was corrupted (at least one error occurred). Minimizing the 
probability of a message error corresponds to minimizing the probability that the 
file is corrupted. 

In other applications, engineers are more interested in the average probability 
of a bit error or bit error rate (BER). That is, they may wish to minimize 

k 

^Yr[D^D 3 ]. (28.16) 

' 3 = 1 

To better appreciate the difference between the average probability of a bit error 
(28.16) and the probability of a message error (28.15), define the RV 

E j = l{D j ^D j }, je{l,...,fc}, 

which indicates whether the j'-th bit was incorrectly decoded. Minimizing the 
probability of a message error minimizes 



k 

Pr 



E^ >0 

3 = 1 

whereas minimizing the average probability of a bit error minimizes 

(28.17) 



b' 



3 = 1 



Thus, minimizing the probability of a message error is equivalent to minimizing the 
probability that one or more of the data bits is corrupted, whereas minimizing the 
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average probability of a bit error is equivalent to minimizing the expected number 
of data bits that are decoded erroneously. 

We next argue that there is no loss in optimality in basing our guess on T also 
when designing to minimize the average probability of a bit error (28.16). We first 
note that to minimize (28.16) we should choose for each j € {1, . . . , k} our guess Dj 
to minimize 

Pr[D^ £>,-]. 

That is, we should consider the binary hypothesis testing problem of guessing 
whether Dj is equal to zero or one, and we should guess Dj to minimize the 
probability of error associated with this problem. To conclude our argument we 
next show that for the purpose of minimizing Pr[_Dj ^ -Dj] , there is no loss in 
optimality in basing our decision on T. To show this, it suffices, by the binary 
version of Theorem 26.3.2, to establish that T also forms a sufficient statistic for 
guessing Dj based on (Y(t)). That is, we need to show that for every f/GN and 
any choice of the epochs ti,...,t n G K, the vector T forms a sufficient statistic 
for guessing Dj based on (Y(ti), . . . , y(t^), T) . This follows from the sufficiency 
of T for guessing D based on (Y(ti), . . . , K(t^), T) and from Proposition 22.4.4, 
which shows that the sufficiency of T for guessing D also implies its sufficiency for 
guessing whether D is in the set of fc-tuples whose j-th component is zero or in its 
complement set of fc-tuples whose j-th component is one. 

More generally we have: 

Proposition 28.3.1. Consider the setup of Proposition 28.2.1. Let ip: d i— > ip(d) 

be any function of the data bits, and let D have an arbitrary prior. Then no 
guessing rule for guessing ip(D) based on (Y(t)j can outperform an optimal rule 
for guessing ip(D) based on T 1 ' 1 ) , . . . , T^ n > . 

Proof. Any function from {0,l} fc can take on at most 2 k different values. Let q 
denote the number of different values that ip(-) takes, i.e., 



<7 = #{^(d):de{0,l} fc }, 



where # A denotes the number of elements in the set A. Denote these different 
values by 71 , . . . , 7 g . The q subsets of T> 

{de{0,l} fc :^(d) = 7K }, K €{l,...,q} 

are disjoint sets whose union is {0, 1} . That is, they form a partition of {0, 1} . 
Guessing ip(D) is equivalent to guessing which subset in this partition contains D. 
For this we know that (T^ 1 ', . . . ,T ( - n ') forms a sufficient statistic because it forms 
a sufficient statistic for guessing D and hence, by Note 22.4.5, it also forms a 
sufficient statistic for guessing which subset in the partition contains D. The 
result now follows from Theorem 26.3.2. □ 

The examples we have seen so far correspond to the case where tp : d t— » d (with the 
probability of guessing ip(D) incorrectly corresponding to a message error) and the 
case ip: d i— > dj (with the probability of guessing ^(D) incorrectly corresponding 
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enc(-) enc(-) 
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enc(Di,. --,-Dk) enc(£>K+i, ■ ■ ■ ,D 2 k) enc(D k -K+i, ■ ■ ■ , D k ) 

Figure 28.1: Block-mode encoding. 

to the probability that the j'-th bit Dj is incorrectly decoded). Another useful 
example is when i/> : d w (rf^, . . . , d„>) for some given !/,v'gN satisfying v' > v . 
This situation corresponds to the case where (-D„, . . . ,-Dj,') constitutes a packet 
and we are interested in the probability that the packet is erroneously decoded. 

Yet another example arises in block-mode transmission — which is described in Sec- 
tion 10.4 and which is depicted in Figure 28.1 — where the data bits D\, . . . , D/~ are 
mapped to the symbols X\, . . . , X n using a (K, N) binary-to-reals block encoder 

enc: {0,1} K ^1R N . 

Here we assume that k is divisible by K and that n = N fc/K. 

If we wish to guess the K-tuple (-D(i,-i)k+1j • • • , -D(i/-i)k+k) with the smallest 
probability of error, then there is no loss in optimality in basing our guess on 
T^ 1 ' , . . . , T^ n > . This follows by applying Proposition 28.3.1 with the function 
ip(d) = (d(„_ 1)K+ i, . . . ,d( l/ _i) K +K)- 

28.4 Consequences of Orthonormality 

The conditional distribution of the inner products in (28.11) becomes simpler when 
the time shifts of the pulse shape by integer multiples of T s are orthonormal. In 
this case we denote the pulse shape by <$(■) and state the orthonormality condition 
as 

/>oo 

cf>(t - £J S ) 4>{t - £%) dt = l{£ = £' }, £,£'eZ, (28.18) 

or, equivalently, as 

R<p*m = r ^l' ' iez. (28.19) 

10 if l ^ 0, 

28.4.1 The Conditional Law of the Sufficient Statistic 

From Proposition 28.2.1 we obtain a key result on PAM communication in white 
Gaussian noise: 
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Corollary 28.4.1. Consider PAM where data bits D\, . . . , Dk are mapped by an 
encoder to the real symbols X\, . . . ,X n , which are then mapped to the waveform 

n 

X{t) = A^X<,(t>(t-n s ), tet, (28.20) 

i=i 

where the pulse shape <f)() is an integrable signal that is bandlimited to W Hz and 
whose time shifts by integer multiples of the baud period T s are orthonormal. Let 
the observed waveform (Y(t)) be given by 

Y(t) = X(t) + N(t), ten, 

where (N(t), t G Kj is independent of the data bits and is white Gaussian noise of 
PSD No/2 with respect to the bandwidth W. 

(i) The n inner products 

f'OO 

tW=/ Y(t) 4>{t - (J s ) dt, te {!,..., n} (28.21) 



form a sufficient statistic for guessing (Di, . . . , D^) based on (Y(t)^ . 

(ii) Conditional on D = d with corresponding encoder outputs (X\, . . . ,X n ) = 
(xi, . . . ,x n ), the inner products (28.21) are independent with 

T^-nUx^Y ee{l,...,n}. (28.22) 

(Hi) The conditional distribution of these inner products can also be expressed as 

T w =Ax e + Z £ , £e{l,...,n}, (28.23a) 

where 

Z 1 ,...,Z n ~IIDAf(o,^Y (28.23b) 

From Proposition 28.3.1 we obtain that T^ 1 ' , . . . , T^ n > also form a sufficient statistic 
for guessing the value of any function of the data bits D±, .... Dk- 

28.4.2 A Further Reduction in the Sufficient Statistic 

We next show a further reduction (from n to N random variables) of the suffi- 
cient statistic in block-mode transmission (with the pulse shape </>(•) still satisfying 
(28.19)). For this reduction to hold we need to assume that the data bits are 
independent or that the k/K. tuples 

(£>!,..., D K ), (£>k+i, • ■ ■ , D 2K ) ,..., (£> fc _ K +i, ...,D k ) (28.24) 

are independent. 
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Proposition 28.4.2. In addition to the assumptions of Corollary 28.4-1, assume 
that X\, . . . , X n are generated from D\, . . . , Dk in block-mode using a (K, N) binary- 
to-reals block encoder. Further assume that the K-tuples in (28.24) are independent. 
Then for every v G {1, . . . , k/K}, the N -tuple 

/ T ((,-1)N+1) ; _ )T (,N)\ (2g25) 

forms a sufficient statistic for guessing the K-tuple 

(£>(„-i)k+i,---,A,k) (28.26) 

or any function thereof. 

Proof. Fix some v G {1, . . . , k/Y.}. For every choice of r\ G N and of the epochs 
ti,...,t„ G R, the n-tuple of matched filter outputs (T^ 1 ' , . . . , T^ n ') forms a suf- 
ficient statistic for guessing D\, . . . , Dk based on (Y{t\), . . . , Y(t v ), T) (Proposi- 
tion 28.2.1). Consequently, by Note 22.4.5, this n-tuple is also sufficient for guessing 
the K-tuple (28.26). We shall next show that the N-tuple (28.25) is sufficient for 
guessing the K-tuple (28.26) based on the n-tuple {T^\ . . . , T (r ^). It will then fol- 
low from Proposition 22.4.3 that the N-tuple (28.25) is also sufficient for guessing 
the K-tuple (28.26) based on (Y(ti), . . . , Y(t v ), T), thus establishing the proposi- 
tion. 

That the N-tuple (28.25) is sufficient for guessing the K-tuple (28.26) based on the 
n-tuple (T' ', . . . ,T^ n ') is equivalent to the irrelevancy of 

r a f M 1 ), . . . ,t( n )V . . . , (t( ( "- 2)n+1 ), . . . ; t( ( ^ 1)n )V 

-, T 
/^(j/N + l) y,((V+l)N)\ /j,(n-N+l) f( n )\ \ 

for guessing the K-tuple (28.26) based on the N-tuple (28.25). To prove this irrele- 
vancy, it suffices to prove two claims: that R is independent of the K-tuple (28.26) 
and that, conditionally on this K-tuple, R is independent of the N-tuple (28.25) 
(Proposition 22.5.5). These claims follow from three observations: that, by the 
orthonormaility assumption (28.19), R is determined by the data bits 

D 1 ,...,D (v _ 1)K ,D vK+1 ,...,D k (28.27) 

and by the random variables 

Z\, ■ ■ ■ , Z(„_i)N: Z„n + i, • • ■ i Z n ; (28.28) 

that the N-tuple (28.25) is determined by the K-tuple (28.26) and by the random 
variables 

Z{v-i)N+ii ■ ■ ■ ,Z v n\ (28.29) 

and that the tuples in (28.26), (28.27), (28.28), and (28.29) are independent. 

Having established that the N-tuple (28.25) forms a sufficient statistic for guessing 
the K-tuple (28.26), it now follows, using arguments very similar to those employed 
in proving Proposition 28.3.1, that the N-tuple (28.25) is also sufficient for guessing 
the value of any function tp(-) of the K-tuple (28.26). □ 
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28.4.3 The Discrete-Time Single-Block Model 

Proposition 28.4.2 is the starting point of much of the literature on block codes, 
upon which we shall touch in Chapter 29. In Coding Theory N is usually called the 
blocklength, and K/N is called the rate in bits per dimension. Coding theorists 
envision that the function enc(-) is used to map k bits to n real numbers using the 
block-encoding rule of Figure 10.1 (with k being divisible by K) and that the result- 
ing real symbols are then transmitted over a white Gaussian noise channel using 
PAM with a pulse shape satisfying the orthogonality condition (28.19). Assuming 
that the data tuples are independent, and by then resorting to Proposition 28.4.2, 
coding theorists focus on the problem of decoding the K-tuple (28.26) from the N 
matched filter outputs (28.25). 

In this problem the index v of the block is immaterial, and coding theorists re- 
label the data bits of the K tuple (28.26) as _D 1; . . . , D^; they re-label the symbols 
to which they are mapped as X\ , . . . , X^ ; and they re- label the corresponding 
observations as Y\ , . . . , Yn . The resulting model is the discrete-time single- 
block model where 

(Xi, . . . , X N ) = enc(£>i, . . . , £> K ) , (28.30a) 

Y v = AX n + Z v , Tje{l,...,N}, (28.30b) 

S„~JV((),^), r/e{l,...,N}, (28.30c) 

where Z\, . . . , Zn are IID and independent of D\, . . . , £>k- We recall that this 
model is appropriate when the pulse shape <p satisfies the orthonormality condi- 
tion (28.18); the data bits are "block IID" in the sense that the fc/K tuples in 
(28.24) are independent; and the additive noise is white Gaussian noise of double- 
sided spectral density N /2 with respect to the bandwidth occupied by the pulse 
shape (f>. It is customary to additionally assume that D\, . . . , Z?k are IID random 
bits (Definition 14.5.1). This is a good assumption if, prior to transmission, the 
data bits are compressed using an efficient data compression algorithm. 
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28.5.1 Introduction and Setup 

We next extend our discussion to the detection of QAM signals. We assume that 
an encoding function 

ip: {0,l} fc ^C™ 

is used to map the k data bits D = (D\, . . . ,Dk) T to the n complex symbols 
C = (Ci, . . . , C n ) T and that the resulting complex symbols are then mapped to 
the passband signal (Xpe(i)), which is given by 



X PB {t) = 2Re(X BB {t) e i2 ^*), t £ 
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where 

n 

X BB (t) = AY,Ci9(t-£T s ), (ef; 

1=1 

the pulse shape g(-) is a complex integrable signal that is bandlimited to W/2 Hz; 
A > is a real constant; and f c > W/2. Conditionally on D = d, we denote the 
transmitted signal by 

x(t; d) = 2A Re ( ^ Q 9{t - IT S ) e i27r/c * J (28.31) 

9l,t(t) 



/ 1 

V2A Y, Re(c^) 2 Re f -p fl(i - £T S ) e i27r ^* 



3i,£,bb(*) 

9Q,<(*) 



+ \/2A^Im(c £ )2Re(i^(7(i-£T s )e i2 ^*y tgl, (28.32) 

Sq,«,bb(*) 

where c = </?(d) is the result of encoding the data bits d; where (28.32) follows from 
(16.7); and where {gi,^}, {gQ,i}, {gi,^,BB}, {gQ/,BB} are as indicated in (28.32) 
and as defined in (16.8) and (16.9). 

We consider the case where, conditional on D = d, the received waveform (Y(t)) 
is given by 

Y(t)=x(t;d)+N(t), teR, (28.33) 

where (N(t)) is white Gaussian noise of PSD No/2 with respect to the band- 
width W around the carrier frequency f c (Definition 25.15.3). 

28.5.2 Real Sufficient Statistics 

The representation (28.32) makes it clear that for every d € {0, l} k the signal 
t i— > x(t; d) can be expressed as a linear combination of the 2n real- valued signals 

{*,*}?=!, {gQ,/}?=i- (28.34) 

Since these signals are integrable signals that are bandlimited to W Hz around the 
carrier frequency f c , it follows that the In inner products 

/>oo 

Tp '= / Y(t)9 U (t)dt, te{l,...,n}, (28.35a) 



T Q= Y{t)9 Q , e {t)dt, ee{l,...,n} (28.35b) 

J — CO 

form a real sufficient statistic for guessing D based on (Y(t)) (Section 26.10). 
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To describe the distribution of the sufficient statistic conditional on each of the 
hypotheses, we next express the inner products between the functions in (28.34) in 
terms of the self-similarity function R gg of the complex pulse shape g 

/oo 
g(t + T)g*(t)dt, reR (28.36) 

-oo 

(Definition 11.2.1). Key to these calculations is the relationship between the inner 
product between real passband signals and the inner product between their complex 
baseband representations (Theorem 7.6.10). Thus, 

(gi,t',&i,e) = 2Re((gi/',BB,gi,£,BB)) 

= Re«t i-> g(t - £%),t^ g(t - £J S ))) 

= Re(Y g{t-£'l 3 )g*{t-£l s )dt 

= Re(R gg ((£-£')T s )), 1,1' eZ, (28.37a) 

where the first equality follows by relating the inner product in passband to the 
inner product in baseband; the second from the expressions for the corresponding 
baseband representations (16.9a); the third from the definition of the inner product 
for complex- valued signals (3.4); and the final equality from the definition of the 
self-similarity function (28.36). Similarly, 

(gQ,<',gQ,*) = 2R - e ((gQ^',BB,gQABB>) 

= Re«i h-> \g(t - £%),t h-> \g(t - £T S ))) 



Rely \g(t-£'T s )(-\)g*(t-£T s )dt 
Re(J git-eXjg^t-n^M 
Re(Rgg((£-OT s )), e,f€Z, (28.37b) 



and 



{gQ,t',Ki,e) = 2Re((gQ^ )B B,gM,BB)) 

= Re((t ^ \g(t - £'T s ),t >-> g(t - £J S ))) 



= Ref i / g(t-£'J s )g*(t-n s )dt 

= ~ lm (J 9(t-t'T s )g*(t-£T s )dt 

= -Im(R m ((£-£')J s )\ £,£/eZ, (28.37c) 

where the first equality leading to (28.37c) follows from the relationship between 
the inner product between real passband signals and the inner product between 
their baseband representations (Theorem 7.6.10); the second from the expressions 
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for the corresponding baseband representations (16.9); the third from the definition 
of the inner product between complex signals (3.4); the fourth from the identity 
Re(iz) = — Im(z); and where the last equality follows from the definition of the 
self-similarity function of complex signals (28.36). 

We are now ready to compute the conditional law of the sufficient statistic given 
each of the hypotheses. Conditional on D = d with corresponding c = ip(d), 
the 2n random variables |Tj ,Ta }. are jointly Gaussian (Section 26.10). Their 
conditional law is thus fully specified by the conditional mean vector and by the 
conditional covariance matrix. We begin with the computation of the former: 



E|7f 



D 



(t^ x(d;t),g he ) 

in n 

/\/2A Y] Re(c//)gi,// + \/2A ^ Im(c//)g Q ,//,g M 



a/2A ^2 ( Re(ce<) (gi,^,gi,£> +Im(ce<) (gQ,*',gi,. 

n / 

y2A^(Re(c f )Re(R gg ((£-£')Ts)) - Im(c* ) Im(R gg ((f - (! )T S 

£' = 1 ^ 
n 

\/2A J2 Re ( c e' R gg(( £ ~ OT s 

£>=i 

\/2A Re ( J2 c <" R ss (i £ ~ £ ') T s) ) , (28.38a) 



where the first equality follows from the definition of Tj (28.35a), from (28.33), 
and from our assumption that the noise (N(t)) is of zero mean; the second from 
(28.32); the third from the linearity of the inner product; the fourth by express- 
ing the inner products using the self- similarity function, i.e., using (28.37a) and 
(28.37c); the fifth by the complex-numbers identity Re(wz) = Re(u;)Re(z) — 
Im(ui) Im(2:); and the final equality because the sum of the real parts is the real 
part of the sum. Similarly, 



El 7^ 



D 



(ii-» x(d;t),g QA ) 

in n 

/ V2A Y, Re(c//)gi,// + V2A ^ Im(c/»)gQ,<',gQ, 
\ i'=i t'=i 

n 

V2A Y (Re(c f ) (g M < , g Q . e ) + Im(cj>/ ) (g Q;< » , g Qi 

£' = 1 

\/2A^[Re(Q,)(-Ini(R gg ((/-^)T s )) J +Im(c e <)Re(R ss ((l- l')T t 

£' = 1 V ^ ' 

n / 

^IlK J2 ( Re(c^) Im(R gg ((€ - £' )T S )) + Im(c r ) Re(R gg ((£ - t')\ 
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^A^Im(c r R gg ((£-£')T s ) 
V^AIm ^c//Rffi((*-OT. 



(28.38b) 



where the first equality follows from the definition of Ta (28.35b), from (28.33), 
and from our assumption that the noise (N(t)) is of zero mean; the second from 
(28.32); the third from the linearity of the inner product; the fourth by express- 
ing the inner products using the self-similarity function, i.e., using (28.37c) and 
(28.37b); the fifth by the conjugate symmetry of the self-similarity function (Propo- 
sition 11.2.2); the sixth by the complex-numbers identity Im(wz) = Re(w) lm(z) + 
Im(w) Re(z); and the final equality by noting that the sum of the imaginary parts 
is equal to the imaginary part of the sum. 

The conditional covariances are easily computed using Note 25.15.4. Using the 
inner products expressions (28.37), we obtain: 



and 



Cov[lf\lf'> 


D = dj 


_ N 
2 

_ No 
2 


Cov[<),Tf) 


D = dj 


_ N 
2 

_ N 
2 


Cov[lf >,!#"> 


D = dj = 


2 { 



'-(&,£', gi,e") 
Re(R gg ((/-£")T s )), 



■ \&Q,£' > f£Q,£") 
■Re(R gg ((/-£")T s ; 



(28.39a) 



(28.39b) 



\&l,£> , f£Q,t") 



No 



Im(R gg ((£'-r)T s 



(28.39c) 



We summarize our results on QAM detection in white Gaussian noise as follows. 

Proposition 28.5.1 (QAM Detection in White Noise: Real Sufficient Statistics). 

Let a QAM signal (28.32) of an integrable pulse shape g(-) that is bandlimited 
to W/2 Hz be observed in white Gaussian noise of PSD No/2 with respect to the 
bandwidth W around the carrier frequency f c . Then: 



(i) The In inner products 



T, 



{£) 



T, 



(£) 



Y(t)9i ti (t)dt, £e{l,...,n}, (28.40a) 

Y(t)9<u(t)dt, £e {!,..., n} (28.40b) 
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form a sufficient statistic for guessing D based on (Y(t)), where 
gi j(t) = 2Re(-^= g(t - £J S ) e i2 ^) , t e R, 
g Q j,{t) = 2Re(-^=i5(t - £T S ) e i2 ^) , t e R. 

(ii) Conditional on D = d with corresponding transmitted symbols c = v?(d), 
these In real random variables are jointly Gaussian with conditional means as 
specified by (28.38) and with conditional covariances as specified by (28.39). 



28.5.3 Complex Sufficient Statistics 

The notation is simpler if we introduce the n complex random variables 

Y{t)g u {t)dt + \ Y(t)9 Q ,((t)dt, ie{l,...,n}. (28.41) 



These n complex random variables form a sufficient statistic in the sense that their 
real and imaginary parts form a sufficient statistic. Using (28.38) we obtain 



E | T W 
= E 



D = d 



if 



D = d +iE T, 



D 



= \/2ARef ^ c t' R gg((^ - T s) ) + iV^AIml ^ ce> R gg ((^- ^')T S 

n 

= \/2A$^c//Rhc((*-OTb), t€{l,...,n}. (28.42) 

£' = 1 

The advantage of the complex notation is that — as we shall see in Proposition 28.5.2 
ahead — conditional on D = d, the random vector T — E[T | D = d] is proper (Defi- 
nition 17.4.1). And since conditionally on D = d it is also Gaussian, it follows from 
Proposition 24.3.11 that, conditional on D = d, the random vector T—E[T | D = d] 
is a circularly-symmetric complex Gaussian (Definition 24.3.2). Its conditional law 
is thus determined by its conditional covariance matrix (Corollary 24.3.8). This 
covariance matrix is an n x n (complex) matrix, whereas the covariance matrix for 
the 2n real variables in Proposition 28.5.1 is a (2n) x (2n) (real) matrix. 

We summarize our results for QAM detection with complex sufficient statistics in 
the following. 

Proposition 28.5.2 (QAM in White Noise: Complex Sufficient Statistics). Con- 
sider the setup of Proposition 28.5.1. 

(i) The complex random vector T = (T*- 1 ), . . . , T^ n >) defined by 

f'OO f'OO 

T&= Y(t)g Le (t)dt+\ Y(t)9<u(t)dt, e& {!,..., n}, 
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forms a sufficient statistic for guessing D based on (Y(t)) . 
(ii) The £-th component of T can be expressed as 

n 

T& = V2A J2 c * R gg(( £ " £ ')T S ) + Z&, te{l,...,n}, 

£' = 1 

where R gg is the self- similarity function of the pulse shape g(-) (28.36), and 
where the random vector Z = (Z^ 1 ' , . . . , Z^') J * s independent of D and is a 
circularly- symmetric complex Gaussian of covariance 

Cov[z^,Z«")] =e[z^(Z^)*] 

= N R gg ((f-£")T s ), £',£"€ {l,...,n}. (28.43) 

(mj // i/ie time shifts of the pulse shape by integer multiples ofl s are orthonormal, 
then 

tW = \/2AC/ + zW, £e{l,...,n,}, (28.44) 

where the complex random variables {Z^>} are independent of {Dj} and are 
IID circularly-symmetric complex Gaussians of variance No- 

Proof. Part (i) follows directly from Proposition 28.5.1 because, by definition, the 
sufficiency of T is equivalent to the sufficiency of its real and imaginary parts. 

To prove Part (ii) define 



z w a T{ i) _ V2A ^ c v Rgg((*- e )t s ), a e {i, 



»}, 



(28.45) 



'=i 



and note that by (28.42) the conditional distribution of Z given D = d is of zero 
mean. Moreover, from Proposition 28.5.1 and from the definition of a complex 
Gaussian random vector as one whose real and imaginary parts are jointly Gaussian 
(Definition 24.3.6), it follows that, conditional on D = d, the vector Z is Gaussian. 
To prove that it is proper we compute 

eIz^zV") 



E[Re(^'>)Re( 


Z (.n 


)-Im(Z^)lm(Z«" 


') 


D 


H 


+ i E[Re(Z^) Im(Z^) + Im(Z«'>) Re(Z< £ ")) 


D = d 


Cov[lf>,lf"> 


D = dj-Cov[<>,<' ) 


D = dj 


+ i (Cov[lf \2 


Q 


D = d] +Cov[r ( J' ) ,2 


i 


') 


D = dj 



= 0, £',£"€ {l,...,n}, 
where the second equality follows from (28.45) and the last equality from (28.39). 
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The calculation of the conditional covariance matrix is very similar except that Z™ ' 
is now conjugated: 



Cm\z^\Z^"> 



D 



D 



e[Bb(Z^) Re(Z^) + Im(Z^) Im(Z^"^ 

+ i E[- Re(z( £ ')) Im(Z^) + lm{Z^'^) Re(Z^ 



Cov T^ ; ,Tj 



(O 



D 



Cov T, 



,(n 



Q 



D 



+ i 
No 



-Qov \T^'\T { Q 



D 



CovlT^.T^ 



d 

D 

■] 

D 



2 Re(R gg ((^' - OT-)J + -y Re(R gg ((f - £")TV 

+ ifem(R gg ((/-f)T s )) - ^Im(R gg ((£"-£')Ts 
N R gg ((/-OT s ), C/e{l n}, 



(28.46) 



where the first equality follows from the definition of the covariance between com- 
plex random variables (17.17); the second by (28.45); the third by (28.39); and the 
last equality by the conjugate-symmetry of the self-similarity function (Proposi- 
tion 11.2.2 (iii)). 

Conditional on D = d, the complex n-vector Z is thus a proper Gaussian, and its 
conditional law is thus fully specified by its conditional covariance matrix (Corol- 
lary 24.3.8). By (28.46), this conditional covariance matrix does not depend on d, 
and we thus conclude that the conditional law of Z conditional on D = d does not 
depend on d, i.e., that Z is independent of D. 

Part (iii) follows from Part (ii). □ 



28.6 Additional Reading 

Proposition 28.2.1 and Proposition 28.5.2 are the starting points of much of the 
literature on equalization and on the use of the Viterbi Algorithm for channels 
with inter-symbol interference (ISI). See, for example, (Proakis, 2000, Chapter 10), 
(Viterbi and Omura, 1979, Chapter 4, Section 4.9), and (Barry, Lee, and Messer- 
schmitt, 2004, Chapter 8). 



28.7 Exercises 

Exercise 28.1 (A Dispersive Channel). Let the transmitted signal (-X'(t)) be as in (28.1), 

and let the received signal (Y(t)) be given by 

Y(t) = (X*h)(t) + JV(t), teR, 



where (N(t)) is white Gaussian noise of PSD N /2 with respect to the bandwidth W, 
and where h is the impulse response of some stable real filter. 
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(i) Show that the n inner products 

y(i)( g *h)(i-rr 8 )dt, <e{i,...,n} 



form a sufficient statistic for guessing Di, . . . Dk based on \Y(tj). 
(ii) Compute their conditional law. 

Exercise 28.2 (PAM in Colored Noise). Let the transmitted signal (X(t)) be as in (28.1), 

and let the received signal (Y(t)) be given by 

Y(t) = X(t) + N(t), t£R, 

where (N(t)) is a centered, stationary, measurable, Gaussian SP of PSD Snn that can be 
whitened with respect to the bandwidth W. Let h be the impulse response of a whitening 
filter for (N(t)) with respect to W. 

(i) Show that the n inner products 

Y(t)(g*h*h)(t-eT B )dt, ee{i,...,n} 



form a sufficient statistic for guessing D\, . . . Dk based on (Y(t)). 
(ii) Compute their conditional law. 

Exercise 28.3 (A Channel with an Echo). Data bits D\, . . . , Dk are mapped to real sym- 
bols Xi, . . . , Xk using the antipodal mapping, so Xe — 1 — 2De, for every I £ {1, . . . , k}. 
The transmitted signal (-X'(i)J is given by X(t) — A YZe Xe <t>(t ~ ^T s ), where cj> is an inte- 
grable signal that is bandlimited to W Hz and that satisfies the orthonormality condition 
(28.18). The received signal (Y(t)) is 

Y(t) = X(t) + aX(t-J s ) + N(t), (61, 

where (iV(t)) is white Gaussian noise of PSD N /2 with respect to the bandwidth W, 
and a is a real constant. Let Ye be the time-^T s output of a filter that is matched to 4> 
and that is fed (Y(t)). 

(i) Do Y\ , . . . , Yk+i form a sufficient statistic for guessing (D\ , . . . , Dfc)? 

(ii) Consider a suboptimal rule that guesses u Dj = 0" if Yj > 0, and otherwise guesses 
"_Dj = 1." Express the probability that this rule guesses Dj incorrectly in terms 
of j, a, A, and No- To what does this probability of error converge when No tends 
to zero? 

Exercise 28.4 (Another Channel with an Echo). Consider the setup of Exercise 28.3 but 
where the echo is delayed by a noninteger multiple of the baud period. Thus, 

Y(t) = X(t) + aX(t-T) + N(t), J6R, 

where < r < T s . Show that the Ik inner products 

J Y(t)(j>{t-£T s )dt, J Y{t)(f>(t-£T S -T)dt, £e{l,...,k} 

J — oc J — oo 

form a sufficient statistic for guessing (Di, . . . , Dk) based on {Y(t)) . 
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Exercise 28.5 (A Multiple-Access Scenario). Two transmitters communicate with a single 
receiver. The receiver observes the signal 

Y(t) = AiXi ^(i) + A 2 X 2 <f> 2 (t) + N(t), t e R, 

where Ai, A 2 > 0; <pi an d 4>2 are orthonormal integrable signals that are bandlimited 
to WHz; the pair (X U X 2 ) takes value in the set {(+1, +1), (+1, -1), (-1, +1), (-1, -1)} 
equiprobably; and where (iV(t)J is white Gaussian noise of PSD No/2 with respect to 
the bandwidth W. 

(i) Can you recover (Xi,X 2 ) from AiXi^i + A2X2<^>2? 

(ii) Find an optimal receiver for guessing (Xi,X 2 ) based on (Y(t)). 

(iii) Compute the optimal probability of error for guessing (Xj,X 2 ) based on (V(t)J. 

(iv) Suppose that a genie informs the receiver of the value of X 2 . How should the 
receiver then guess X\ based on (Y(t)f and the information provided by the genie? 

(v) A receiver guesses "Xi = +1" if (Y, 4>i) > and guesses "Xi = —1" otherwise. Is 
this receiver optimal for guessing X\l 

Exercise 28.6 (Two Receiver Antennas). Consider the setup of (28.1). We observe two 
signals (Yi(t)J, (Y 2 (t)) that are given at every epoch t G R by 

Y 1 (t)= (X*hi)(t) + JVi(t), Y 2 {t)= (X*h 2 )(i) + iV 2 (t), 

where hi and h 2 are the impulse responses of two real stable filters, and where the 
stochastic processes (A?i(£)) and (AT 2 (t)J are independent white Gaussian noise processes 
of PSD No/2 with respect to the bandwidth W. 

(i) Extend Definition 26.3.1 to the case where the observation consists of two stochastic 
processes. 

(ii) Show that the 2n inner products 

yi(t)(g*hi)(t-n- s )dt, / Y 2 {t)( s *h 2 )(t-n 3 )dt, ie{i,...,n} 

■j — 00 

form a sufficient statistic for guessing D\,.. .Dk based on (Yi(t)) and (Y 2 (t)). 

Exercise 28.7 (Bits of Unequal Importance). Consider the setup of Section 28.3 but 
where some data bits are more important than others. We therefore wish to minimize the 
weighted average 

h 

3=1 
for some positive an, . . . , ctk that sum to one. 

(i) Is it still optimal to base our guess of D\, . . . , Dk on the inner products in (28.11)? 
(ii) Does this criterion lead to a different receiver design than the bit error rate? 

Exercise 28.8 (Sandwiching the Probability of a Message Error). In the notation of 
Section 28.3, show that 



I J2 Pt[Dj ^ Dj] < max {Pr[4 jt D 3 ]} < Pr[D / D] < j^ Pr[5, / D 3 ] 

3=1 3=1 
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Exercise 28.9 (Sandwiching the Bit Error Rate). In the notation of Section 28.3, show 

that 

1 1 k 

- Pr[D jt D] < - J2 P*[Dj + Dj] < Pr[D ^ D] . 

3 = 1 

Exercise 28.10 (Transmission via an Unknown Dispersive Channel). A random switch 
that is outside our control and whose realization is not observed determines whether the 
observed output (Y(t)j is given by 

X*hi+N or X*h 2 + N, 

where (X(t)) is the transmitted signal of (28.1); (N(t)) is white Gaussian noise of 
PSD N /2 with respect to the bandwidth W; and hi & h 2 are the impulse responses of 
two stable real filters. Show that the In inner products 



y(t)(g*hi)(t-n-.)dt, / Y(t)( s *h 2 )(t-£T s )dt, ee{i,...,n} 

J — oo 

form a sufficient statistic for guessing Di, . . . Dk based on \Y(t)j. 



Chapter 29 

Linear Binary Block Codes with Antipodal 
Signaling 



29.1 Introduction and Setup 

We have thus far said very little about the design of good encoders. We men- 
tioned block encoders but, apart from defining and studying some of their basic 
properties (such as rate and energy per symbol), we have said very little about 
how to design such encoders. The design of block encoders falls under the heading 
of "Coding Theory" and is the subject of numerous books such as (Mac Williams 
and Sloane, 1977), (van Lint, 1998), (Blahut, 2002), (Roth, 2006) and (Richard- 
son and Urbanke, 2008). Here we provide only a glimpse of this theory for one 
class of such encoders: the class of binary linear block encoders with antipodal 
pulse amplitude modulation. Such encoders map the data bits D\, . . . , Z?k to the 
real symbols X\ , . . . , An by first applying a one-to-one linear mapping of binary 
K-tuples to binary N-tuples and by then applying the antipodal mapping 

0^+1 
1 i-> -1 

to each component of the binary N-tuple to produce the {±l}-valued symbols 

X\ , . . . , X^ . 

Our emphasis in this chapter is not on the design of such encoders, but on how 
their properties influence the performance of communication systems that employ 
them in combination with Pulse Amplitude Modulation. We thus assume that the 
transmitted waveform is given by 

A^2x e <t>(t-n B ), teR, (29.1) 

£ 

where A > is a scaling factor, T s > is the baud period, </>(•) is a real integrable 
signal that is bandlimited to W Hz, and where the time shifts of </>(•) by integer 
multiples of T s are orthonormal 

<j>(t-n s )<j>(t-e'T s )dt = i{e = £'}, e,e'ez. (29.2) 
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The summation in (29.1) can be finite, as in the block-mode that we discussed 
in Section 10.4, or infinite, as in the bi-infinite block-mode that we discussed in 
Section 14.5.2. We shall further assume that the PAM signal is transmitted over 
an additive noise channel where the transmitted signal is corrupted by Gaussian 
noise that is white with respect to the bandwidth W. We also assume that the 
data are IID random bits (Definition 14.5.1). 

In Section 29.2 we briefly discuss the binary field F2 and discuss some of the basic 
properties of the set of all binary K-tuples when it is viewed as a vector space over 
this field. This allows us in Section 29.3 to define linear binary encoders and codes. 
Section 29.4 introduces binary encoders with antipodal signaling, and Section 29.5 
discusses the power and power spectral density when they are employed in conjunc- 
tion with PAM. Section 29.6 begins the study of decoding with a discussion of two 
performance criteria: the probability of a block error (also called message error) 
and the probability of a bit error. It also recalls the discrete-time single-block 
channel model. Section 29.7 contains the design and performance analysis of the 
guessing rule that minimizes the probability of a block error, and Section 29.8 con- 
tains a similar analysis for the guessing rule that minimizes the probability of a bit 
error. Section 29.9 explains why performance analysis and simulation is often done 
under the assumption that the transmitted data is the all-zero data. Section 29.10 
discusses how the encoder and the PAM parameters influence the overall system 
performance. The chapter concludes with a discussion of the (suboptimal) Hard 
Decision decoding rule in Section 29.11 and of bounds on the minimum distance 
of a code in Section 29.12. 

29.2 The Binary Field F 2 and the Vector Space F£ 

29.2.1 The Binary Field F 2 

The binary field F2 consists of two elements that we denote by and 1. An 
operation that we denote by © is defined between any two elements of F2 through 
the relation 

0©0 = 0, 0©1 = 1, 1 ©0 = 1, 1©1 = 0. (29.3) 

This operation is sometimes called "mod 2 addition" or "exclusive-or" or "GF(2) 
addition." (Here GF(2) stands for the Galois Field of two elements after the French 
mathematician Fvariste Galois (1811-1832) who did ground-breaking work on finite 
fields and groups.) Another operation — "GF(2) multiplication" — is denoted by a 
dot and is defined via the relation 

0-0 = 0, 0-1 = 0, 1-0 = 0, 1-1 = 1. (29.4) 

Combined with these operations, the set F2 forms a field, which is sometimes 
called the Galois Field of size two. We leave it to the reader to verify that the © 
operation satisfies 

a© 6 = 6© a, a, 6 e F 2 , 

(a©6) ©c = a©(&©c), a, 6, c e F 2 , 

a©0 = 0©a = a, a G F 2 , 
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affia = 0, ae¥ 2 ; 
and that the operations © and • satisfy the distributive law 
(a ©6) -c= (a-c)®(b-c), a, 6, c e F 2 . 

29.2.2 The Vector Field F% 

We denote the set of all binary K-tuples by F 2 and define the componentwise-© 
operation between K-tuples u = (u\, . . . , u K ) € F 2 and v = (vi, . . . , w K ) € F 2 as 

u©v = (ui©ui,...,u«©v«), u,veF 2 . (29.5) 

We define the product between a scalar a G F 2 and a K-tuple u = (ui, . . . , u K j € F 2 
by 

a • u = (a • «i, . . . , a ■ u K ) . (29.6) 

With these operations the set F 2 forms a vector space over the field F 2 . The all-zero 
K-tuple is denoted by 0. 

29.2.3 Linear Mappings 

A mapping T : F 2 — > Fj is said to be linear if 

T(a-u©/3-v) = a-T(u)©/3-T(v), (a, /3 € F 2 , u, v e F§) . (29.7) 

The kernel of a linear mapping T : F 2 — > F2 is denoted by Ker(T) and is the set 
of all K-tuples in F 2 that are mapped by T(-) to the all-zero 7y-tuple 0: 

Ker(T)= {ueF^ :T(u) = 0}. (29.8) 

The kernel of every linear mapping contains the all-zero tuple 0. 

The image of T : F 2 — > F!? is denoted by Image(T) and consists of those elements 
of F2 to which some element of Fj is mapped by T(-): 

Image(T) = (T(u) : u e F£}. (29.9) 

The key results from Linear Algebra that we need are summarized in the following 
proposition. 

Proposition 29.2.1. Let T: F 2 — > F!] be linear. 

(i) The kernel ofT(-) is a linear subspace o/FjJ. 

(ii) The mapping T(-) is one-to-one if, and only if, Ker(T) = {0}. 

(Hi) The image ofT(-) is a linear subspace ofW^- 

(iv) The sum, of the dimension of the kernel and the dimension of the image space 
is equal to the dimension of the domain: 

Dim(Ker(T)) + Dim(lmage(T)) = n. (29.10) 

(v) If U is a linear subspace of F?? of dimension k, then there exists a one-to-one 
linear mapping from F£ to ¥2 whose image is hi. 
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29.2.4 Hamming Distance and Hamming Weight 

The Hamming distance c!h(u, v) between two binary K-tuples u and v is defined 
as the number of components in which they differ. For example, the Hamming 
distance between the tuples (1,0,1,0) and (0,0,1,1) is two. It is easy to prove 
that for u, v, w g F£ : 

djj(u,v) > with equality if, and only if, u = v; (29.11a) 

d H (u,v) = d H (v,u); (29.11b) 

d H (u,w) <d H (u,v)+d H (v,w). (29.11c) 

The Hamming weight wh(u) of a binary K-tuple u is defined as the number of 
its nonzero components. Thus, 

w H (u) = d H (u,0), ueF£, (29.12) 

and 

d H (u,v) =w H (u©v), u,veF£. (29.13) 



29.2.5 The Componentwise Antipodal Mapping 

The antipodal mapping T : F2 — » {— 1, +1} maps the zero element of F2 to the 
real number +1 and the unit element of F2 to —1: 

T(0) = +1, T(l) = -1. (29.14) 

This rule is not as arbitrary as it may seem. Although one might be somewhat 
surprised that we do not map 1 G F2 to +1, we have our reasons. We prefer the 
mapping (29.14) because it maps mod-2 sums to real products. Thus, 

T(a©6) = T(a)T(6), a,be¥ 2 , (29.15) 

where the operation on the RHS between T(a) and T(6) is the regular real- numbers 
multiplication. This extends by induction to any finite number of elements of F2: 

T(ci©c 2 •••©£;„) = Y[T(c t ), ci,...,ceF 2 . (29.16) 



The componentwise antipodal mapping T^: F2 — > {— 1,+1}' ) maps elements 
of Fg to elements of { — 1, +1} 7 ' by applying the mapping (29.14) to each component: 

T„ : (ci, . . . , c,) -> (T(ci), . . • , T(c,)) . (29.17) 

For example, T3 maps the triplet (0,0,1) to (+1,+1,— 1). 
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29.2.6 Hamming Distance and Euclidean Distance 

We next relate the Hamming distance dn(u,v) between any two binary 77-tuples 
u = (tti, . . . , Urj) and v = (vi, . . . , v n ) to the squared Euclidean distance between 
the results of applying the componentwise antipodal mapping T v to them. We 
argue that 

d|(T,(u),T f ,(v))=4d H (u ) v) ) u ; v€F; (29.18) 

where d^(-, ■) denotes the Euclidean distance, so 

4(T,(u),T,(v))=^(TK)-TW) 2 . (29.19) 

To prove this relationship it suffices to consider the case where r\ = 1, because the 
Hamming distance is the sum of the Hamming distances between the respective 
components, and likewise for the squared Euclidean distance. To prove this result 
for rj = 1 we note that if the Hamming distance is zero, then u and v are identical 
and hence so are T(u) and T(i>), so the Euclidean distance between them must be 
zero. And if the Hamming distance is one, then a/u, and hence T(w) and T(v) 
are of opposite sign but of equal unit magnitude, so the squared Euclidean distance 
between them is four. 



29.3 Binary Linear Encoders and Codes 

Definition 29.3.1 (Linear (K,N) F2 Encoder and Code). Let N and K be positive 
integers. 

(i) A linear (K, N) F 2 encoder is a one-to-one linear mapping from F^ to F™ . 
(ii) A linear (K, N) F2 code is a linear subspace o/F^ of dimension K. 1 

In both definitions N is called the blocklength and K is called the dimension. 

For example, the (K, K + 1) systematic single parity check encoder is the 
mapping 

(di, . . . , d K ) I-* (di, ■ ■ ■ , d K , di © d 2 © • • • © d K ) ■ (29.20) 

It appends to the data tuple a single bit that is chosen so that the resulting (K+ 1)- 
tuple be of even Hamming weight. The (K, K+ 1) single parity check code is the 
subset of F 2 consisting of those binary (K + l)-tuples whose Hamming weight is 
even. 

Recall that the image of a mapping g : A — > B is the subset of B comprising those 
elements y G B to which there corresponds some x G A such that g(x) = y. 



x The terminology here is not standard. In the Coding Theory literature a linear (K, N) F2 
code is often called a "binary linear [N , K] code." 
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Proposition 29.3.2 (F2 Encoders and Codes). 

(i) If T: F2 — » F™ is a linear (K, N) F2 encoder, then its image is a linear 
(K,N) F 2 code. 

(ii) Every linear (K, N) F2 code is the image of some (nonunique) linear (K, N) 
F2 encoder. 

Proof. We begin with Part (i). Let T: F^ — > F™ be a linear (K,N) F 2 encoder. 
That its image is a linear subspace of F^ follows from Proposition 29.2.1 (iii). 
That its dimension must be K follows from Proposition 29.2.1 (iv) (see (29.10)) 
because the fact that T(-) is one-to-one implies, by Proposition 29.2.1 (ii), that 
Ker(T) = {0} so Dim(Ker(T)) = 0. 

To prove Part (ii) we note that Fg is of dimension K and that, by definition, every 
linear (K,N) F2 code is also of dimension K. The result now follows by noting 
that there exists a one-to-one linear mapping between any two subspaces of equal 
dimensions over the same field (Proposition 29.2.1 (v)). □ 

Any linear transformation from a finite-dimensional space to a finite-dimensional 
space can be represented as matrix multiplication. A linear (K,N) F 2 encoder is 
no exception. What is perhaps unusual is that coding theorists use row vectors 
to denote the data K-tuples and the N-tuples to which they are mapped. They 
consequently use matrix multiplication from the left. This tradition is so ingrained 
that we shall begrudgingly adopt it. 

Definition 29.3.3 (Matrix Representation of an Encoder). We say that the linear 
(K, N) F 2 encoder T: Fg — > F™ is represented by the matrix G if G is a K x N 

matrix whose elements are in ¥2 and 

T(d) = dG, deF 2 K . (29.21) 

Note that in the matrix multiplication in (29.21) we use F2 arithmetic, so the 77-th 
component of dG is given by d^ 1 ' -gt 1 ^' ©• • -(Bd^ K ' -g^ K ' v ' , where g^ K ^> is the Row-k 
Column- 77 component of the matrix G, and where d^ K ' is the K-th component of d. 

For example, the (K, K + 1) F 2 systematic single parity check encoder (29.20) is 
represented by the K x (K + 1) matrix 



(29.22) 



The matrix G in (29.21) is uniquely specified by the linear transformation T(-): 
its 77-th row is the result of applying T(-) to the K-tuple (0, . . . , 0, 1, 0, . . . , 0) (the 
K-tuple whose components are all zero except for the 77-th, which is one). 

Moreover, every K x N binary matrix G defines a linear transformation T(-) via 
(29.21), but this linear transformation need not be one-to-one. It is one-to-one if, 
and only if, the subspace of F™ spanned by the rows of G is of dimension K. 
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Definition 29.3.4 (Generator Matrix). A matrix G is a generator matrix for a 

given linear (K, N) F2 code if G is a binary K x N matrix such that the image of 
the mapping d 1— > dG is the given code. 

Note that there may be numerous generator matrices for a given code. For example, 
the matrix (29.22) is a generator matrix for the single parity check code. But there 
are others. Indeed, replacing any row of the above matrix by the sum of that row 
and another different row results in another generator matrix for this code. 

Coding theorists like to distinguish between a code property and an encoder 
property. Code properties are properties that are common to all encoders of the 
same image. Encoder properties are specific to an encoder. Examples of code 
properties are the blocklength and dimension. We shall soon encounter more. An 
example of an encoder property is the property of being systematic: 

Definition 29.3.5 (Systematic Encoder). A linear (K, N) F 2 encoder T: F^ -» F^ 
is said to be systematic (or strictly systematic) if, for every K-tuple (d\, . . . , g?k) 
in F2 , the first K components of T((di,. .. , g?k)) are equal to d\, ... ,d^. 

For example, the encoder (29.20) is systematic. An encoder whose image is the 
single-parity check code and which is not systematic is the encoder 

(di,...,d K ) i-» (d 1 ,d 1 ®d 2l d 2 (Bd3,...,d K - 1 (Bd Kl d K ). (29.23) 

The reader is encouraged to verify that if a linear (K, N) F2 encoder T: ¥ 2 — » F^ 
is represented by the matrix G, then T(-) is systematic if, and only if, the K x K 
matrix that results from deleting the last N — K columns of G is the K x K identity 
matrix. 

Definition 29.3.6 (Parity-Check Matrix). A parity-check matrix for a given 
linear (K, N) F2 code is a K x N matrix H such that a (row) N -tuple c is in the 
code if, and only if, cH T is the all-zero (row) vector. 

For example, a parity-check matrix for the (K,K + 1) single-parity check code is 
the 1 x (K + 1) matrix 

H = (l,l,...,l). 

(Codes typically have numerous different parity-check matrices, but the single- 
parity check code is an exception.) 



29.4 Binary Encoders with Antipodal Signaling 
Definition 29.4.1. 

(i) We say that a (K, N) binary-to -reals block encoder enc: {0, 1} K — » K N is a 
linear binary (K, N) block encoder with antipodal signaling if 

enc(d) = T N (T(d)), d e F£, (29.24) 
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where T: F^ — > F^ is a linear (K, N) F2 encoder, and where T»(-) is the 
componentwise antipodal mapping (29.17). Thus, if (Xi, . . . , -Xn) denotes 
the N-tuple produced by enc(-) when fed the data K-tuple (D\, . . . , D^), then 

X - ] +1 i f the V- thcom P onentso f~t{{ D ii---iD K ))iszero, 
1—1 otherwise. 

(ii) A linear binary (K.N) block code with antipodal signaling is the image 
of some linear binary (K, N) block encoder with antipodal signaling. 

In analogy to Proposition 29.3.2, the image of every linear binary (K, N) block 
encoder with antipodal signaling is a linear binary (K, N) block code with antipodal 
signaling. 

If enc(-) can be represented by the application of T(-) to the data K-tuple followed 
by the application of the componentwise antipodal mapping Tm, then we shall 
write 

enc =T N oT. (29.26) 

Since T^ is invertible, there is a one-to-one correspondence between T and enc. 

An important code property is the distribution of the result of applying an encoder 
to IID random bits. 

Proposition 29.4.2. Let T: F^ — » F^ be a linear (K, N) F2 encoder. 

(i) Applying T to a K-tuple of IID random bits results in a random N-tuple that 
is uniformly distributed over Image(T). 

(ii) Applying Yn o T to IID random bits produces an N -tuple that is uniformly 
distributed over the image of Image(T) under the componentwise antipodal 
mapping Tn . 

Proof. Part (i) follows from the fact that the mapping T(-) is one-to-one. Part (ii) 
follows from Part (i) and from the fact that Tn(-) is one-to-one. □ 

For example, it follows from Proposition 29.4.2 (ii) and from (29.16) that if we 
feed IID random bits to any encoder (be it systematic or not) whose image is the 
(K, K + 1) single parity check code and then employ the componentwise antipodal 
mapping Tn(-), then the resulting random (K + l)-tuple (X±, . . . ,Xk+i) will be 
uniformly distributed over the set 

K+l . 

(£i,...,6< + i)e{-i,+i} K+1 :n^ = +1 • 

■q=l ' 

Corollary 29.4.3. Any property that is determined by the joint distribution of the 
result of applying the encoder to IID random bits is a code property. 

Examples of such properties are the power and operational power spectral density, 
which are discussed next. 




29.5 Power and Operational Power Spectral Density 661 

29.5 Power and Operational Power Spectral Density 

To discuss the transmitted power and the operational power spectral density we 
shall consider bi-infmite block encoding (Section 14.5.2). We shall then use the 
results of Section 14.5.2 and Section 15.4.3 to compute the power and operational 
PSD of the transmitted signal in this mode. 

The impatient reader who is only interested in the transmitted power for pulse 
shapes satisfying the orthogonality condition (29.2) can apply the results of Sec- 
tion 14.5.3 directly to obtain that, subject to the decay condition (14.46), the 
transmitted power P is given by 



(29.27) 



We next extend the discussion to general pulse shapes and to the operational PSD. 
To remind the reader that we no longer assume the orthogonality condition (29.2), 
we shall now denote the pulse shape by g(-) and assume that it is bandlimited 
to W Hz and that it satisfies the decay condition (14.17). Before proceeding with 
the analysis of the power and PSD, we wish to characterize linear binary (K, N) 
block encoders with antipodal signaling that map IID random bits to zero-mean 
N-tuples. Note that by Corollary 29.4.3 this is, in fact, a code property. Thus, if 
enc = Tjm ° T, then the question of whether enc(-) maps IID random bits to zero- 
mean N-tuples depends only on the image of T. Aiding us in this characterization 
is the following lemma on linear functionals. A linear functional on F£ is a linear 
mapping from F£ to F2. The zero functional maps every K-tuple in F£ to zero. 

Lemma 29.5.1. Let L: F^ — * F2 be a linear functional that is not the zero func- 
tional. Then the RV X defined by 

x= ( + l ifL{(D 1 ,...,D K ))=0, 
\-l ifl((D 1 ,...,D K ))=l 

is of zero mean whenever D\, . . . , Z?k are IID random bits. 

Proof. We begin by expressing the expectation of X as 

E[X}= ]TPr[D = d]T(L(d)) 

deF* 

= 2- K E T ( L ( d )) 



2 -K £ (+l) + 2- K £ (- 1 ) 

deFj:L(d)=0 dGF£:L(d)=l 

2- K (#L- 1 (0)-#L- 1 (l) 



where 



L- 1 (0) = {deF 2 K :L(d) = 0} 
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is the set of all K-tuples in F^ that are mapped by !_(•) to 0, where L _1 (l) is anal- 
ogously defined, and where # A denotes the number of elements in the set A. It 
follows that to prove that E[X] = it suffices to show that if L(-) is not determin- 
istically zero, then the sets L _1 (0) and L _1 (l) have the same number of elements. 
We prove this by exhibiting a one-to-one mapping from L _1 (0) onto L _1 (l). (If 
there is a one-to-one mapping from a finite set A onto a finite set B, then A and B 
must have the same number of elements.) To exhibit this mapping, note that the 
assumption that L(-) is not the zero transformation implies that the set L~ 1 (l) is 
not empty. Let d* be an element of this set, so 

L(d*) = 1. (29.29) 

The required mapping maps each do € L _1 (0) to do © d»: 

L- 1 (0)9do^d ©d». (29.30) 

We next verify that it is a one-to-one mapping from L _1 (0) onto L (1). That it is 
one-to-one follows because if do©d* = doffid* then by adding d* to both sides we 
obtain d © d* © d* = d' © d* © d* , i.e. , that d = d (because d* © d* = 0) . That 
this mapping maps each element of L _1 (0) to an element of L _1 (l) follows because, 
as we next show, if d G L _1 (0), then L(d ffid sf ) = 1. Indeed, if d € L~ 1 (0), then 



L(d ) = 0, (29.31) 



and consequently, 



L(d © d*) = L(d ) © L(d*) 
= 0©1 
= 1, 

where the first equality follows from the linearity of L(-), and where the second 
equality follows from (29.29) and (29.31). That the mapping is onto follows by 
noting that if di is any element of L _1 (l), then di © d* is in L~ 1 (0) and it is 
mapped by this mapping to di. □ 

Using this lemma we can show: 

Proposition 29.5.2. Let (Xi, . . . ,Xn) be the result of applying a linear binary 
(K, N) block encoder with antipodal signaling to a binary \\-tuple comprising IID 
random bits. 

(i) For every rye{l,...,N}, the RV X^ is either deterministically equal to +1, 
or else of zero mean. 

(ii) For every rj,rj' € {1, ...,N}, the random variables X^ and X v i are either 
deterministically equal to each other or else E[X^X^/] = 0. 

Proof. Let the linear binary (K, N) block encoder with antipodal signaling enc(-) 
be given by enc = Yn o T, where T: F^ — * F™ is one-to-one and linear. Let 
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(Xi , . . . , Xn ) be the result of applying enc to the K-tuple D = (D\, . . . , -Dk), 
where D\, . . . , Z?k are IID random bits. 

To prove Part (i), fix some r/G{l,...,N}, and let L(-) be the linear functional that 
maps d to the rj-th component of T(d), so X v = T(l_(D)), where D denotes the 
row vector comprising the K IID random bits. If L(-) maps all data K-tuples to zero, 
then X v is deterministically equal to +1. Otherwise, E[X,j] = by Lemma 29.5.1. 

To prove Part (ii), let the matrix G represent the mapping T(-), so X v = T(DG(' ,T "), 
where G^'' v ' denotes the ry-th column of G. Expressing X v > in a similar way, we 
obtain from (29.15) 

x n x v , = t(dg ( -' ,?) ) t(dg ( - , ''' ) ) 

= t(dg ( -'' )) ©dg ( -'' ) ' ) 

= t(d(G ( '' ,)) ©G ( -' t '' ) )). (29.32) 



Consequently, if we define the linear functional L: d h d(G('' ?? * 1 © G^' 1 ' '), then 
X V X V > = T(L(D)j . This linear functional is the zero functional if the 77-th column 
of G is identical to its r/'-th column, i.e., if X v is deterministically equal to X v i. 
Otherwise, it is not the zero functional, and ENX^-X^/] (= E[T(L(D))1) must be 
zero (Lemma 29.5.1). □ 

Proposition 29.5.3 (Producing Zero-Mean Uncorrelated Symbols). A linear bi- 
nary (K, N) block encoder with antipodal signaling enc = Tn o T produces zero- 
mean uncorrelated symbols when fed IID random bits if, and only if, the columns 
of the matrix G representing T(-) are distinct and neither of these columns is the 
all-zero column. 

Proof. The 77-th symbol X v produced by enc = Tn o T when fed the K-tuple of 
IID random bits D = (D±, . . . , £>k) is given by 

X v = T(DG ( '' , ' ) ) 

= T( j D 1 -G (1 '" ) ©---© j Dk-G (k ''' ) ) 

where G*-'^'' is the 77-th column of the K x N generator matrix of T(-). Since the 
linear functional 

d^d 1 .G (1 <" ) ©---©d K -G (K < r ' ) 

is the zero functional if, and only if, 

G* 1 '") = ■ ■ ■ = G< K> "> = 0, (29.33) 

it follows that X v is deterministically zero if, and only if, the 77-th column of G is 
zero. From this and Lemma 29.5.1 it follows that all the symbols produced by enc 
are of zero mean if, and only if, none of the columns of G is zero. 

A similar argument shows that the product X 71 X v i, which by (29.32) is given by 

t(d(g ( -'' )) ©g ( -' t '' ) ) 
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is deterministically zero if, and only if, the functional 

d h- rfi • (G^ © G< 1>,J '>) © • • • © d K ■ (G^ © G( K '"')) 

is zero, i.e., if, and only if, the 77-th and ry'-th columns of G are equal. Otherwise, 
by Lemma 29.5.1, we have E[X^X,j/] = 0. □ 



Note 29.5.4. By Corollary 29.4.3 the property of producing zero-mean uncorre- 
cted symbols is a code property. 

Proposition 29.5.5 (Power and PSD). Let the linear binary (K, N) block encoder 
with antipodal signaling enc = Tn o T produce zero-mean uncorrelated symbols 
when fed IID random bits, and let the pulse shape g satisfy the decay condition 
(14.17). Then the transmitted power P in bi-infinite block- encoding mode is given 

by 



and the operational PSD is 





■s 




) is 






A 2 , ,2 
S xx (f) = — \g(f)\\ 


/el. 



(29.34) 



(29.35) 



Proof. The expression (29.34) for the power follows either from (14.33) or (14.38). 
The expression for the operational PSD follows either from (15.20) or from (15.23). 

□ 



Engineers rarely check whether an encoder produces uncorrelated symbols when 
fed IID random bits. The reason may be that they usually deal with pulse shapes <p 
satisfying the orthogonality condition (29.2) and the decay condition (14.46). For 
such pulse shapes the power is given by (29.27) without any additional assumptions. 
Also, by Theorem 15.4.1, the bandwidth of the PAM signal is typically equal to the 
bandwidth of the pulse shape. In fact, by that theorem, for linear binary (K, N) 
block encoders with antipodal signaling 



bandwidth of PAM signal = bandwidth of pulse shape, 



(29.36) 



whenever A/0; the pulse shape g is a Borel measurable function satisfying the 
decay condition (14.17) for some a,/3 > 0; and the encoder produces zero-mean 
symbols when fed IID random bits. Thus, if one is not interested in the exact form 
of the operational PSD but only in its support, then one need not check whether 
the encoder produces uncorrelated symbols when fed IID random bits. 
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29.6 Performance Criteria 

Designing an optimal decoder for linear binary block encoders with antipodal sig- 
naling is conceptually very simple but algorithmically very difficult. The structure 
of the decoder depends on what we mean by "optimal." In this chapter we focus 
on two notions of optimality: minimizing the probability of a block error — also 
called message error — and minimizing the probability of a bit error. Referring 
to Figure 28.1, we say that a block error occurred in decoding the z^-th block if 
at least one of the data bits (i5(^_ 1 \ K+1 , . . . , -D(„-i)k+k) was incorrectly decoded. 
We say that a bit error occurred in decoding the j-th bit if Dj was incorrectly 
decoded. 

We consider the case where IID random bits are transmitted in block-mode and 
where the transmitted waveform is corrupted by additive Gaussian noise that is 
white with respect to the bandwidth W of the pulse shape. The pulse shape is 
assumed to satisfy the orthonormality condition (29.2) and the decay condition 
(14.17). From Proposition 28.3.1 it follows that for both optimality criteria, there 
is no loss in optimality in feeding the received waveform to a matched filter for <p 
and in basing the decision on the filter's output sampled at integer multiples of T s . 
Moreover, for the purposes of decoding a given message it suffices to consider only 
the samples corresponding to the symbols that were produced when the encoder 
encoded the given message (Proposition 28.4.2). Similarly, for decoding a given 
data bit it suffices to consider only the samples corresponding to the symbols that 
were produced when the encoder encoded the message of which the given bit is part. 
These observations lead us (as in Section 28.4.3) to the discrete-time single-block 
model (28.30). For convenience, we repeat this model here (with the additional 
assumption that the data are IID random bits): 

(X U ...,X N ) =enc(£>i,...,£> K ); (29.37a) 

Y n = AX V + Z„, rye{l,...,N}; (29.37b) 

Zi,...,Z N ~IID./vYo, — ] ; (29.37c) 

£>!,..., D K ~ IID W ({0,1}), (29.37d) 

where (Zi,...,Zn) are independent of (D\, . . . , D^). We also introduce some 
additional notation. We use x v (d) for the 77-th component of the N-tuple to which 
the binary K-tuple d is mapped by enc(-): 

^(d) = 77-th component of enc(d), (n € {1, . . . , N}, d G F* ) • (29.38) 

Denoting the conditional density of (Yi, . . . , In) given (Xi, . . . ,Xn) by /y|x(')> 
we have for every y € R N of components j/i, . . . , J/n and for every x G { — 1, +1} N 
of components X\, . . . , in 

/v|x=x(y) = (ttNo)-^ 2 fjexp (- ( ^" N ^' )2 ). (29.39) 
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Parameter 


In Section 21.6 


In Section 29.7 


number of observations 


J 


N 


number of hypotheses 


M 


2 K 


set of hypotheses 


{1,...,M} 


F* 


dummy hypothesis variable 


m 


d 


prior 


{""m} 


uniform 


conditional mean tuple 


(s (1) s 0) ) 


(Aa;i(d),...,Acc N (d)) 


conditional variance 


a 2 


N /2 



Table 29.1: A conversion table for the setups of Section 21.6 and of Section 29.7. 



Likewise, for every y € R N and every data tuple d G F^ 



/ Y |D=d(y) = (ttNo)-^ 2 n exp 

r,= l 



(y n - Ax v (d)Y 
Nn 



(29.40) 



29.7 Minimizing the Block Error Rate 

29.7.1 Optimal Decoding 

To minimize the probability of a block error, we need to use the random N-vector 
Y = (Yi, . . . , Yn) to guess the K-tuple D = (Di, . . . , £>k) . This is the type of 
problem we addressed in Section 21.6. The translation between the setup of that 
section and our current setup is summarized in Table 29.1: the number of obser- 
vations, which was given there by J, is here N; the number of hypotheses, which 
was given there by M, is here 2 K ; the set of possible messages, which was given 
there by Ai = {1,...,M}, is here the set of binary K-tuples F^; the dummy 
variable for a generic message, which was given there by m, is here the binary 
K-tuple d; the prior, which was denoted there by {n m }, is here uniform; the mean 
tuple corresponding to the m-th message, which was given there by (sin, ■ • • , 8m) 
is here (Aa;i(d), . . . , Axn (d)j (see (29.38)); and the conditional variance of each 
observation, which was given there by er 2 , is here No/2. 

Because all the symbols produced by the encoder take value in {—1, +1}, it follows 
that 



N 

£ 



(Ax v (d)Y 



A 2 N, 



deF 



so all the mean tuples are of equal Euclidean norm. From Proposition 21.6.1 (iii) 
we thus obtain that, to minimize the probability of a block error, our guess should 
be the K-tuple d* that satisfies 



v =i 



Yn 



max > x n (d) Y„, 

1 t}—\ 



(29.41) 



with ties being resolved uniformly at random among the data tuples that achieve 
the maximum. Our guess should thus be the data sequence that when fed to the 
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encoder produces the {±l}-valued N-tuple of highest correlation with the observed 
tuple Y. Note that, by definition, all block encoders are one-to-one mappings 
and thus the mean tuples are distinct. Consequently, by Proposition 21.6.2, the 
probability that more than one tuple d* satisfies (29.41) is zero. 

Since guessing the data tuple is equivalent to guessing the N-tuple to which it is 
mapped, we can also describe the optimal decision rule in terms of the encoder's 
output. 

Proposition 29.7.1 (The Max-Correlation Decision Rule). Consider the problem 
of guessing D based on Y for the setup of Section 29.6. 

(i) Picking at random a message from the set 

{N N "| 

deF 2 K : ^ Xr] {A)Y n = max^x^d)^ \ (29.42) 

minimizes the probability of incorrectly guessing D. 

(ii) The probability that the above set contains more than one element is zero. 

(Hi) For the problem of guessing the encoder's output, picking at random an N- 
tuple from the set 

in n "| 

x e Image(enc) : 2, x v ~^n = max /, x n ^n ( (29.43) 

— J xGlmagc(enc) — J 

r\=\ r/=l ) 

minimizes the probability of error. This set contains more than one element 
with probability zero. 

Conceptually, the problem of finding an N-tuple that has the highest correlation 
with (Y\, . . . , Yn) among all the N-tuples in the image of enc(-) is very simple: one 
goes over the list of all the 2 K N-tuples that are in the image of enc(-) and picks 
the one that has the highest correlation with {Y\, . . . , Yn). But algorithmically 
this is very difficult because 2 K is in most applications a huge number. It is one of 
the challenges of Coding Theory to come up with encoders for which the decoding 
does not require an exhaustive search over all 2 K tuples. As we shall see, the 
single parity check code is an example of such a code. But the performance of this 
encoder is, alas, not stellar. 



29.7.2 Wagner's Rule 

For the (K, K+ 1) systematic single parity check encoder (29.20), the decoding can 
be performed very efficiently using a decision algorithm that is called Wagner's 
Rule in honor of C.A. Wagner. Unlike the brute-force approach that considers all 
possible data tuples and which thus has a complexity which is exponential in K, 
the complexity of Wagner's Rule is linear in K. 
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Wagner's Rule can be summarized as follows. Consider the (K + 1) tuple 

? „4 + ; *T'** , = 1 K + l. (29.44) 

1—1 otherwise, 

If this tuple has an even number of negative components, then guess that the en- 
coder's output is (£i, . . . ,£k+i) an d that the data sequence is thus the inverse of 
(£i,...,£k) under the componentwise antipodal mapping Tk, i.e., that the data 
tuple is (1 — £i)/2, . . . , (1 — £k)/2. Otherwise, flip the sign of ^ corresponding to 
the Y Vt of smallest magnitude. I.e., guess that the encoder's output is 

£l> • • • ;£rj„-l) — £77, i£tj„+1 • • • i^K+1; (29.45) 

and that the data bits are 



2 2 2 2 

where r/.^ is the element o/{l,...,K+l} satisfying 



(29.46) 



\Y Vm \= min \Y V \. (29.47) 



Proof that Wagner's Rule is Optimal. Recall that the (K,K + 1) single parity 
check code with antipodal signaling consists of all ±1- valued (K + l)-tuples having 
an even number of — l's. We seek to find the tuple that among all such tuples max- 
imizes the correlation with the received tuple {Y\, . . . , Yj<+i)- The tuple defined in 
(29.44) is the tuple that among all tuples in { — 1, +1} K+1 has the highest correla- 
tion with (Yj, . . . , Yk+i). Since flipping the sign of £„ reduces the correlation by 
2|Y,j|, the tuple (29.45) has the second-highest correlation among all the tuples in 
{ — 1, +1} K+1 . Since the tuples (29.44) and (29.45) differ in one component, exactly 
one of them has an even number of negative components. That tuple thus maxi- 
mizes the correlation among all tuples in { — 1,-1-1} that have an even number 
of negative components and is thus the tuple we are after. 

Since the encoder is systematic, the data tuple that generates a given encoder 
output is easily found by considering the first K components of the encoder output 
and by then applying the mapping +1 i— > and — 1 i— > 1, i.e., £ t— » (1 — £)/2. □ 



29.7.3 The Probability of a Block Error 

We next address the performance of the detector that we designed in Section 29.7.1 
when we sought to minimize the probability of a block error. We continue to assume 
that the encoder is a linear binary (K, N) block encoder with antipodal signaling, 
so the encoder function enc(-) can be written as enc = Tn o T where T : FJJ" — > F^ 1 
is a linear one-to-one mapping and Tn (•) is the componentwise antipodal mapping. 
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An Upper Bound 

It is usually very difficult to precisely evaluate the probability of a block error. A 
very useful bound is the Union Bound, which we encountered in Section 21.6.3. 
Denoting by pmap (error |D = d) the probability of error of our guessing rule con- 
ditional on the binary K-tuple D = d being fed to the encoder, we can use (21.59), 
Table 29.1, and (29.18) to obtain 



v^ / /2A 2 d H (T(d'),T(d)) \ 
P MAp(error|D = d) < £ Q W V ^ ' l " . (29.48) 

It is customary to group all the equal terms on the RHS of (29.48) and to write 
the bound in the equivalent form 



IN 

PMAp(error|D = d) <Y,#{ A ' e F 2 : d H (T(d'), T(d)) = u] Q 



'2A 2 u 



v=l 



No / ' 

(29.49) 
where 

#{d' G F 2 K : d H (T(d'),T(d)) = y) (29.50) 

is the number of data tuples that are mapped by T(-) to a binary N-tuple that 
is at Hamming distance v from T(d), and where the sum excludes v = because 
the fact that T(-) is one-to-one implies that if d' ^ d then the Hamming distance 
between T(d') and T(d) must be at least one. 

We next show that the linearity of T(-) implies that the RHS of (29.49) does not 
depend on d. (In Section 29.9 we show that this is also true of the LHS.) To this 
end we show that for every v G {1, . . . , N} and for every d G F^, 

#{d' G F 2 K : d H (T(d'),T(d)) =v} = #{d G F 2 K : w H (T(d)) = v) (29.51) 

= #{c G Image(T) : w H (c) = v), (29.52) 

where the RHS of (29.51) is the evaluation of the LHS at d = 0. To prove (29.51) 
we note that the mapping d' i— > d' © d is a one-to-one mapping from the set whose 
cardinality is written on the LHS to the set whose cardinality is written on the 
RHS, because 

(d H (T(d'),T(d)) =!/)«■ (w H (T(d)©T(d')) = v 

'w H (T(d©d')) = 



where the first equivalence follows from (29.13), and where the second equivalence 
follows from the linearity of T(-). To prove (29.52) we merely substitute c for T(d) 
in (29.51) and use the fact that T(-) is one-to-one. 

Combining (29.49) with (29.52) we obtain the bound 



2A 2 z/ 



p MA p(error|D = d) < ]T#{c G Image(T) : w H (c) = v) Q \\ -rj- ■ (29.53) 



Nr 



670 Linear Binary Block Codes with Antipodal Signaling 

The list of N + 1 nonnegative integers 

'#{c G Image(T) : w H (c) = 0}, . . . ,#{c G Image(T) : w H (c) = N} 



(whose first term is equal to one and whose terms sum to 2 K ) is called the weight 
enumerator of the code. 



For example, for the (K, K + 1) single parity check code 

f~ „*• ,„- > i 1 if v is odd, 

# dGF 2 K :w H T(d) )=v \ = \ . i/ = 0,...,K+l 

*. > I it v is even. 



because this code consists of all (K + l)-tuples of even Hamming weight. Conse- 
quently, this code's weight enumerator is 



K+l\ /K+l\ /K+l\ 

2 J,0,f 4 J,0,...,( K J,0j, ifKiseven. 

The minimum Hamming distance d m i n> H of a linear (K,N) F2 code is the 
smallest Hamming distance between distinct elements of the code. (If K = 0, i.e., 
if the only codeword is the all-zero codeword, then, by convention, the minimum 
distance is said to be infinite.) By (29.52) it follows that (for K > 0) the minimum 
Hamming distance of a code is also the smallest weight that a nonzero codeword 
can have 

dmin.H = min w H (c). (29.54) 

ceImagc(T)\{0} 

With this definition we can rewrite (29.53) as 

N 

Pmap (error |D = d) < J^ #{c G Image(T) : w H (c) = v) Q 



l'=dmin l H V 

(29.55) 
Engineers sometimes approximate the RHS of (29.55) by its first term: 



^ jc G In.a.-vtT! : w H (c) = d min , H } Q IW ^EEJ! ] (2 <)..-,(i) 

This is reasonable when A /No 3> 1 because the Q(-) function decays very rapidly; 
see (19.18). 

The term 

#{c G Image(T) : w H (c) = d min , H } (29.57) 

is sometimes called the number of nearset neighbors. 
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A Lower Bound 

Using the results of Section 21.6.4, we can obtain a lower bound on the probability 
of a block error. Indeed, by (21.65), Table 29.1, (29.18), the monotonicity of Q(-), 
and the definition of d m i n h 



/>MVi-U-m»i|D = d)>g| t / 2A ^° in ' H | . (29..18) 



29.8 Minimizing the Bit Error Rate 

In some applications we want to minimize the number of data bits that are incor- 
rectly decoded. This performance criterion leads to a different guessing rule, which 
we derive and analyze in this section. 

29.8.1 Optimal Decoding 

We next derive the guessing rule that minimizes the average probability of a bit 
error, or the Bit Error Rate. Conceptually, this is simple. For each kg{1,...,K} 
our guess of the K-th data bit D K should minimize the probability of error. This 
problem falls under the category of binary hypothesis testing, and, since D K is a 
priori equally likely to be or 1, the Maximum-Likelihood rule of Section 20.8 is 
optimal. To compute the likelihood-ratio function, we treat the other data bits 
D\, . . . , D K _i, D K+ i, . . . , Z?k as unobserved random parameters (Section 20.15.1). 
Thus, using (20.101) with the random parameter now corresponding to the tuple 
(Di,.. .,D K ^i,D K+1 ,...,D K ) we obtain 2 

/Y|D„=o(yi)---.J/N) 

= 2 -(K-i) J- f Yu ...,Y N \D=d(Vi,---,VN) (29-59) 



de-4. o 



deA.,0 v=i V ° / 

where the set A K ^ consists of those tuples in F^ whose K-th component is zero 

A Kfi = {(di,...,d K )€W$ :d K = o}. (29.61) 

Likewise, 

/Y|D„ = l(yi,---,J/N) 

= 2 -(K-i) J- f Yu ...,Y N \o=d(Vi,---,VN) (29-62) 

d<£A K ,i 



2 Our assumption that the data are IID random bits guarantees that the random parameter 
© = (Di, . . . , D K -i, D K+ i, . . . , D|<) is independent of the RV D K that we wish to guess. 
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= 2-( K - 1 VN )- N / 2 £ nexp(- (y "-^" (d))2 V (29-63) 

where we similarly define 

A Kil = {(d!, ...,4)eF 2 K :4 = l}- (29.64) 

Using Theorem 20.7.1 and (29.60) & (29.63) we obtain the following. 

Proposition 29.8.1 (Minimizing the BER). Consider the problem of guessing D K 
based on Y for the setup of Section 29.6, where n € {1, . . . , K}. The decision rule 
that guesses "D K = 0" if 

deA K , 7j=l \ ° / deA,,i 77=1 \ ° / 

i/\ai guesses "D K = I" if 

y- rr Z' (^ ~ Ax 7)( d )) \ „ y- A Z' (yr,-A^(d)) \ 

s n-p — Wo — )< e n-p( — Wo — b 

de^l K , r;=l \ / deA K-1 77=1 \ / 

and i/iai guesses at random in case of equality minimizes the probability of guessing 
the data bit D K incorrectly. 

The difficulty in implementing this decision rule is that, unless we exploit some 
algebraic structure, the computation of the sums above has exponential complexity 
because the number of terms in each sum is 2 K_1 . 

It is interesting to note that, unlike the decision rule that minimizes the probability 
of a block error, the above decision rule depends on the value of No/2. 



29.8.2 The Probability of a Bit Error 

We next obtain bounds on the probability that the detector of Proposition 29.8.1 
errs in guessing the K-th data bit D K . We denote this probability by p* K . 

An Upper Bound 

Since the detector of Proposition 29.8.1 is optimal, the probability that it errs 
in decoding the K-th data bit D K cannot exceed the probability of error of the 
suboptimal rule whose guess for D K is the K-th bit of the message produced by the 
detector of Section 29.7. Thus, if ^map(') denotes the decision rule of Section 29.7, 
then 

P ;<Pr[D©0MAp(Y)e„4 K ,i], «G{1,...,K}, (29.65) 

where the set A Kj i was defined in (29.64) as the set of messages whose K-th com- 
ponent is equal to one, and where Y is the observed N-tuple whose components 
are given in (29.37b). 
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Since the data are IID random bits, we can rewrite (29.65) as 



9 ^ W Z Z Pr[0MAp(Y) = d©d|D = d], «e{l,...,K}. (29.66) 



dGF* deA,,i 



Since </>map ( Y) can only equal d © d if Y is at least as close in Euclidean distance 
to enc(d © d) as it is to enc(d), it follows from Lemma 20.14.1, Table 29.1, and 
(29.18) that 



Pr[0 MA p(Y) = d©d|D = d]<Q 



Ad E (T N (T(d©d)),T N (T(d)) 



o / No 
Z V 2 



/ 



Q 



Q 



Q. 



A 2 d|(T N (T(d©d)),T N (T(d)) 
2N^ 



2A 2 d H (T(d©d),T(d)) 
/2A 2 w H (T(d©d)©T(d)) 



Q 



/2A 2 w H (T(d)) 



(29.67) 



It follows from (29.66) and (29.67) upon noting that RHS of (29.67) does not 
depend on the transmitted message d that 



Pl< Z 2 



deA K 



2A 2 w H (T(d)) 

No 



KG {1,...,K}. (29.68) 



This bound is sometimes written as 



N 

P* K < Z 7(^«)Q 

J / =d min ,H 



2A 2 u 



N, 



KG{1,...,K}, 



(29.69a) 



where 7(1/, k) denotes the number of elements d of F2 whose K-th component is 
equal to one and for which T(d) is of Hamming weight is, i.e., 



7(1/, k) = #{d G A,i : w H (T(d)) = i/}, 



(29.69b) 



and where the minimum Hamming distance d m i nj H is defined in (29.54). 
Sometimes one is more interested in the arithmetic average of p* K 

K 






(29.70) 
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which is the optimal bit error rate. We next show that (29.68) leads to the 
upper bound 



(29.71) 




This follows from the calculation 

K=l K=ld£^,l 

=££2 

K=ldeFf 



2A 2 w H (T(d)) 
No 

2A 2 w H (T(d)) 

Nn 



I{d G A K ,i} 



£2 

deFj 

E2 

d£F, K 



^c^)|:i { d^., l} 



2A 2 w H (T(d)) 



Nr 



w H (d), 



where the inequality in the first line follows from (29.68); the equality in the second 
by introducing the indicator function for the set A K ^i and extending the summa- 
tion; the equality in the third line by changing the order of summation; and the 
equality in the last line by noting that every d G F2 is in exactly wjj(d) of the sets 
-4i,i, ■ • • ,-4k,i- 



A Lower Bound 

We next show that, for every k G {1, . . . ,K}, the probability p* K that the optimal 
detector for guessing the K-th data bit errs is lower-bounded by 



•p* > max Q 

~ de.4. i 



2A 2 w H (T(d)) 
No 



(29.72) 



where A K .i denotes the set of binary K-tuples whose K-ih component is equal to 
one (29.64). To derive (29.72), fix some d G A K ,i and note that for every d' G F^ 

(d'e A Kfi )e> (d'ffide^i). (29.73) 

This allows us to express /Y|D lc =i(y) f° r every y G K N as 

/Y| B .=i(y) = 2- (K - 1} Y, /v|D=a(y) 



de_4 K>1 

2- (K - x) £ /Y| D =d®d'(y), 



(29.74) 
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where the first equality follows from (29.62) and the second from (29.73). 

Using the exact expression for the probability of error in binary hypothesis testing 
(20.20) we have: 



z -'ye 



/ min{/ Y |D K =o(y),/Y|D„=i(y)}dy 

Jy£R N l > 

= i /min^-D £ / Y|D=d ,(y), 2-( K " 1 ) £ / Y|D=dffid ,(y 
= 2-( K - 1 )i /"mini ^ / Y |D=d'(y), E / Y |D=ded'(y)}dy 

"^ *• d'eA K . d>eA K , > 

> 2 -(K-i)l J J2 min{/ Y | D=d ,(y), / Y | D=ded ,(y)}dy 

d'eA K . a 

= 2 -(K-i) £ /^min{/ Y |D=d'(y), / Y |D=d®d'(y)}dy 
d'e.4«,o 

= 2 -(K-l) ^ Q 



/2A 2 d H (T(d'),T(d'ed)) 



d'eA K 



2 -( K -D ]T Q 



/2A 2 w H (T(d)) 



d'eA K 



N 



where the first line follows from (20.20); the second by the explicit forms (29.59) & 
(29.74) of the conditional densities / Y |d k =o(0 and / Y |£> re =i( - ); the third by pulling 
the common term 2~( K ~ 1 ' outside the minimum; the fourth because the minimum 
between two sums with an equal number of terms is lower-bounded by the sum of 
the minima between the corresponding terms; the fifth by swapping the summation 
and integration; the sixth by Expression (20.20) for the optimal probability of error 
for the binary hypothesis testing between D = d' and D = d © d'; the seventh by 
the linearity of T(-); and the final line because the cardinality of A k .q is 2( K_1 ). 
Since the above derivation holds for every d G A K ^i, we may choose d to yield the 
tightest bound, thus establishing (29.72). 



29.9 Assuming the All-Zero Codeword 

When simulating linear binary block encoders with antipodal signaling over the 
Gaussian channel we rarely simulate the data as IID random bits. Instead we 
assume that the message that is fed to the encoder is the all-zero message and that 
the encoder's output is hence the N-tuple whose components are all +1. In this 
section we shall explain why it is correct to do so. More specifically we shall show 
that pMAp(error|D = d) does not depend on the message d and is thus equal to 
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Pmap (error |D = 0). We shall also prove an analogous result for the decoder that 
minimizes the probability of a bit error. The proofs are based on two features of 
our setup: the encoder is linear and the Gaussian channel with antipodal inputs is 
symmetric in the sense that 

fY\x=-i(y) = fY\x=+i(-y), 2/eR. (29.75) 

Indeed, by (29.37b), 

1 (i/ + A) 2 

f Y \x=-i{y) = -f== e N " 

1 (-y-A) 2 

= e N o 

VttNo 

= fY\x=+i(-y), y e a. 

Definition 29.9.1 (Memoryless Binary-Input/Output-Symmetric Channel). We 

say that the conditional distribution of Y = (Y"i,...,Yn) conditional on X = 
(Xi,...,Xn) corresponds to a memoryless binary-input/output-symmetric 
channel if 

N 

/Y|x=x(y)=n/m=^W< xe{-l,+l} N , (29.76a) 

r,= l 

w/iere 

/r|x=-i(y) = /y|x=+i(-y), yGR. (29.76b) 

For every d G F^ define the mapping tpd : R N — > R N as 

tpd- (yi,...,yu) i-> (yiXi(d),...,y N x N (d)). (29.77) 

The function V'd(') thus changes the sign of those components of its argument 
that correspond to the negative components of enc(d). The key properties of this 
mapping are summarized in the following lemma. 

Lemma 29.9.2. As in (29.38), letx n (d) denote the result of applying the antipodal 
mapping T to the rj-th component o/T(d), where T: Fg — > F™ is some one-to-one 
linear mapping. Let the conditional law of (Yi, . . . , In) given D = d be given by 
Il v =ifY\x=x v (d)(yr 1 ), where f Y \x{-) satisfies the symmetry property (29.75). Let 
ipd(') be defined as in (29.77). Then 

(i) 4>o{') maps each y G K N to itself, 
(ii) For any d, d' G F^~ the composition of ipd> with i/>d is given by i/'ded' ■' 

ip d o t/> d , = i/> d 0d' ■ (29.78) 

(mj i/>d is egwaZ to ite inverse 

V>d(V>d(y))=y, yeR N . (29.79) 

fwj For every d G F^ i/ie Jacobian of the mapping ipd{') is one. 
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(v) For every d G Fj and every y € R N , 

/ Y |D= d (y) = / Y |D=o(^ d (y)). (29.80) 

(vi) For any d, d' € F^ and every y € K N , 

/ Y | D =d' {My)) = /Y|D=d'©d(y)- (29-8i) 

Proof. Part (i) follows from the definition (29.77) because the linearity of T(-) and 
the definition of Tn guarantee that x v (0) = + 1, for all r\ € {1, . . . , N}. Part (ii) 
follows by linearity and from (29.15): 

(*l>d° il>A'){yi, ■ ■ ■ ,Vn) = ^d(yixi{d r ), . . . ,y N x N (d r )) 

= (yixi(d')xi(d),.. . ,y N a; N (d'):r N (d)) 
= ( 2/l a; 1 (d'©d),...,y N a; N (d'©d)) 

= ipd®d'(yi,---,y™), 

where in the third equality we used (29.15) and the linearity of the encoder. 
Part (iii) follows from Parts (i) and (ii). Part (iv) follows from Part (iii) or di- 
rectly by computing the partial derivative matrix and noting that it is diagonal 
with the diagonal elements being ±1 only. Part (v) follows from (29.75). To prove 
Part (vi) we substitute d' for d and V'd(y) for y in Part (v) to obtain 

/v|D=d'(V'd(y)) = f-Y\r>=o\i/>d'(i>d(y)) 

= /Y|D=o(^d©d'(y)) 
= /Y|D=d©d'(y), 

where the second equality follows from Part (ii), and where the third equality 
follows from Part (v). □ 

With the aid of this lemma we can now justify the all-zero assumption in the 
analysis of the probability of a block error. We shall state the result not only for 
the Gaussian setup but also for the more general case where the conditional den- 
sity /yix(') corresponds to a memoryless binary-input/output-symmetric channel. 

Theorem 29.9.3. Consider the setup of Section 29.6 with the conditional den- 
sity /yix(') corresponding to a memoryless binary-input /output- symmetric chan- 
nel. Let pMAp(error|D = d) denote the conditional probability of a block error for 
the detector of Proposition 29.7.1, conditional on the data tuple being d. Then, 

p M Ap(error|D = d) = p M Ap(error|D = 0), d G F^. (29.82) 

Proof. The proof of this result is not very difficult, but there is a slight technicality 
that arises from the way ties are resolved. Since on the Gaussian channel ties occur 
with probability zero (Proposition 21.6.2), this issue could be ignored. But we 
prefer not to ignore it because we would like the proof to apply also to channels 
satisfying (29.76) that are not necessarily Gaussian. To address ties, we shall 
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assume that they are resolved at random as in Proposition 29.7.1 (i.e., as in the 
Definition 21.3.2 of the MAP rule). 

For every d G F^ and every v G {1, . . . ,2 K }, define the set 2?d,i/ C K N to contain 
those y G R N for which the following two conditions hold: 

/v|D=d(y) = max / Y |D=d'(y), (29.83a) 

#{d G F 2 K : / Y|D=d (y) = / Y |D=d(y)}= v. (29.83b) 

Whenever y G 2?d,^, the MAP rule guesses "D = d" with probability 1/v. Thus, 

2 K 

p M Ap(error|D = d) = 1 - V - / I{y G P d ,4 / Y |D=d(y) dy. (29.84) 
The key is to note that, by Lemma 29.9.2 (v), for every d G F 2 and v G {1, . . . , 2 K } 

(y e v dtV ) o (v» d (y) e P ,„) • (29.85) 

(Please pause to verify this.) Consequently, by (29.84), 

2 K 

PMAp(error|D = d) = 1 - V - / I{y G V d . v ) / Y |D=d(y) dy 

2 K 

= i-E-/ i{ye2W/ Y | D =o(V>d(y))dy 

^ ^ iyei N 

2 K 

= ! - E - / HV'd(y) e P d ,4 / Y | D =o(y) dy 

v=l v Jy€R N 
2 K 

= l ~ E " / : iy e p o.-> /v|D=o(y) dy 

= PMAp(error|D = 0), 

where the first equality follows from (29.84); the second by Lemma 29.9.2 (v); the 
third by defining y = V'd(y) and using Parts (iv) and (iii) of Lemma 29.9.2; the 
fourth by (29.85); and the final equality by (29.84). □ 

We now formulate a similar result for the detector of Proposition 29.8.1. Let 
p* R (error|D = d) denote the conditional probability that the decoder of Proposi- 
tion 29.8.1 incorrectly decodes the K-th data bit, conditional on the tuple d being 
fed to the encoder. Since the data are IID random bits, 

p* K = 2" K Y^ P*(error|D = d). (29.86) 

deFj 
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Since ties are resolved at random 
D = d) 



p* (error 
= Pr 



E /Y|D=d'(Y)> E /Y|D=d'(Y) 
d'£A K}0 d'eA KA 



D = d 



1 



Pr 



E /Y|D=d'00= E /Y|D=d'(Y) 
d'eA^.o d'£A K .i 



D = d 



, d £ A,i, (29.87) 



and 

p* (error 
= Pr 



D = d) 

E /Y|D=d'00< E /Y|D=d'(Y) 

L d'G.4 K ,o d'eA^.i 



D = d 



1 



Pr 



E /Y|D=d'(Y)= E /Y|D=d'(Y) 

d'eA K , d'eA K ,i 



D = d 



d e ^«,o. (29. 



Theorem 29.9.4. Under the assumptions of Theorem 29.9.3, we have for every 

KG{1,...,K} 



p* (errorjD = d) = p* K (errorjD = 0), d £ F 2 , 



and consequently 



Pi =PK( errOT l D = )- 



(29.89) 
(29.90) 



Proof. It suffices to prove (29.89) because (29.90) will then follow by (29.86). To 
prove (29.89) we begin by denning e(d) for d £ F 2 as follows. If d is in A Kt i, then 
we define e(d) as 



E /Y|D=d'(Y)> E /Y|D=d'(Y) 
d'eA Ki0 d'eA K ^ 



e(d) = Pr 
Otherwise, if d is in A k .q, then we define e(d) as 



D = d 



e(d) = Pr 



E /Y|D=d'(Y)< E /Y|D=d'(Y) 

L d'e^i Kj o d'eA K .! 



D=d 



We shall prove (29.89) for the case where 

d£A K A. 



d£A K A. 



d £ A K ,< 



(29.91) 



The proof for the case where d £ A k .q is almost identical and is omitted. For d 
satisfying (29.91) we shall prove that e(d) does not depend on d. The second term 
in (29.87) which accounts for the random resolution of ties can be treated very 
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similarly. To show that e(d) does not depend on d we compute: 
e(d) 



= / l \ Y /v|D=d'(y)> Y /v|D=d'(y) [/ Y |D=d(y) d y 

= / l \ Y /v|D=d'(y) > Y /v|D=d'(y) f /YiD=o(V'd(y))dy 

= / ! E / Y |D= d '(V'd(y)) > J] / Y |D= d '(V'd(y))}/Y|D=o(y)dy 

= / : i Y /v|D=d'©d(y) > X! /v|D=d'©d(y) [/v|D=o(y)dy 

= / : i Z) /v|D=d(y) > Z /Y|D=a(y)f / Y | D =o(y) d y 

= e(0), 

where the second equality follows from Lemma 29.9.2 (v); the third by defining 
the vector y as y = ^d(y) and by Parts (iv) and (iii) of Lemma 29.9.2; the fourth 
by Lemma 29.9.2 (vi); and the fifth equality by defining d = d © d' and using 
(29.73). □ 



29.10 System Parameters 

We next summarize how the system parameters such as power, bandwidth, and 
block error rate are related to the parameters of the encoder. We only address the 
case where the pulse shape <j) satisfies the orthonormality condition (29.2). As we 
next show, in this case the bandwidth W in Hz of the pulse shape can be expressed 
as 

1 M 

(29.92) 

where Rb is the bit rate at which the data are fed to the modem in bits per 
second, and where the excess bandwidth, which is defined in Definition 11.3.6, is 
nonnegative. To verify (29.92) note that if the data arrive at the encoder at the 
rate of Rb bits per second and if the encoder produces N real symbols for every K 
bits that are fed to it, then the encoder produces real symbols at a rate 



w = 


1 N 

= - Rb — (1 + excess bandwidth), 

2 K 



Rs = T7 Rb 



real symbol 
second 



(29.93) 



so the baud period must be 

T s = . (29.94) 

N R b V ' 

It then follows from Definition 11.3.6 that the bandwidth of 4> ls given by (29.92) 
with the excess bandwidth being nonnegative by Corollary 11.3.5. 
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As to the transmitted power P, by (29.27) and (29.94) it is given by 



P = E b R b , 



where E b denotes the energy per data bit and is given by 

E b = - A 2 . 



(29.95) 



(29.96) 



It is customary to describe the error probability by which one measures performance 
as a function of the energy-per-bit E b . 3 Thus, for example, one typically writes the 
upper bound (29.55) on the probability of a block error using (29.96) as 



Pmap (error |D = d) 

N 

^ X] #( c e Ima ge( T ) : w H (c) = 


--"} 2 




2E h (K/N)u \ 


V No J' 



(29.97) 



29.11 Hard vs. Soft Decisions 

In Section 29.7 we derived the decision rule that minimizes the probability of a block 
error. We saw that, in general, its complexity is exponential in the dimension K of 
the code because a brute-force implementation of this rule requires correlating the 
N-tuple Y with each of the 2 K tuples in Image(enc). For the single parity check 
rule we found a much simpler implementation of this rule, but for general codes 
the decoding problem can be very difficult. 

A suboptimal decoding rule that is sometimes implemented is the Hard Decision 
decoding rule, which has two steps. In the first step one uses the observed real- 
valued N-tuple (Yi, . . . , Yn) to form the binary tuple (C\, . . . , Cn) according to 
the rule 



c „ 



if Y v > 0, 



1 if Y v < 0, 



V 



1, 



,N, 



and in the second step one searches for the message d for which T(d) is closest in 
Hamming distance to (Ci, . . . , Cn)- The advantage of this decoding rule is that 
the first step is very simple and that the second step can be often performed very 
efficiently if the code has a strong algebraic structure. 



29.12 The Varshamov and Singleton Bounds 



Motivated by the approximation (29.56) and by (29.58), a fair bit of effort in 
Coding Theory has been invested in finding (K, N) codes that have a large minimum 



3 The terms "energy-per-bit," "energy-per-data-bit," and "energy-per-information-bit" are 
used interchangeably. 
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Hamming weight and reasonable decoding complexity. One of the key existence 
results in this area is the Varshamov Bound. We state here a special case of this 
bound pertaining to our binary setting. 

Theorem 29.12.1 (The Varshamov Bound). Let K and N be positive integers, 
and let d be an integer in the range 2 < d < N — K+l. // 

]T ( N " ^ < 2 N - K , (29.98) 

then there exists a linear (K,N) F2 code whose minimum distance d m ; n H satisfies 

d m in,H > d. 

Proof. See, for example, (MacWilliams and Sloane, 1977, Chapter 1, Section 10, 
Theorem 12) or (Blahut, 2002, Chapter 12, Section 3, Theorem 12.3.3). □ 

A key upper bound on d m i nj H is given by the Singleton Bound. 

Theorem 29.12.2 (The Singleton Bound). // N and K are positive integers, then 
the minimum Hamming distance d m i n ,H of any linear (K,N) F2 code must satisfy 

d min ,H < N - K + 1. (29.99) 

Proof. See, for example, (Blahut, 2002, Chapter 3, Section 3, Theorem 3.2.6) or 
(van Lint, 1998, Chapter 5, Section 2, Corollary 5.2.2) or Exercise 29.10. □ 



29.13 Additional Reading 

We have only had a glimpse of Coding Theory. A good starting point for the 
literature on Algebraic Coding Theory is (Roth, 2006). For more on the modern 
coding techniques such as low-density parity-check codes (LDPC) and turbo-codes, 
see (Richardson and Urbanke, 2008). 

The degredation resulting from hard decsions is addressed, e.g., in (Viterbi and 
Omura, 1979, Chapter 3, Section 3.4). 

The results of Section 29.9 can be extended also to non-binary codes with other 
mappings. See, for example, (Loeliger, 1991) and (Forney, 1991). 

For some of the literature on the minimum distance and its asymptotic behavior 
in the block length, see, for example, (Roth, 2006, Chapter 4) 

For more on the decoding complexity see the notes on Section 2.4 in Chapter 2 of 
(Roth, 2006). 

29.14 Exercises 

Exercise 29.1 (Orthogonality of Signals). Recall that, given a binary K-tuple d e F^ 
and a linear (K, N) F2 encoder T(-), we use x v (d) to denote the result of applying the 
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antipodal mapping T(-) to the 77-th component of T(d). Let the pulse shape <j> be such 
that its time shifts by integer multiples of the baud period T s are orthonormal. Show that 

N N , 



if, and only if, d H (T(d), T(d')) = N/2. 

Exercise 29.2 (How Many Encoders Does a Code Have?). Let the linear (K,1M) F 2 
encoder T : F* — > F^ be represented by the K x N matrix G. Show that any linear 
(K, N) F2 encoder whose image is equal to the image of T can be written in the form 

d ^ dAG, 

where A is a K x K invertible matrix whose entries are in F2. How many such matrices A 
are there? 

Exercise 29.3 (The (4,7) Hamming Code). A systematic encoder for the linear (4,7) F2 
Hamming code maps the four data bits d 1 ,d2,d 3 ,d 4 to the 7-tuple 

(di,d 2 ,d 3 ,di,di ® d 3 ® d 4 , d t ® d 2 ® d 4 , d 2 ® d 3 ©d 4 ). 

Suppose that this encoder is used in conjunction with the componentwise antipodal map- 
ping T 7 (-) over the white Gaussian noise channel with PAM of pulse shape whose time 
shifts by integer multiples of the baud period are orthonormal. 

(i) Write out the 16 binary codewords and compute the code's weight enumerator. 

(ii) Assuming that the codewords are equally likely and that the decoding minimizes the 
probability of a message error, use the Union Bound to upper-bound the probability 
of codeword error. Express your bound using the transmitted energy per bit Eb- 

(iii) Find a lower bound on the probability that the first bit Di is incorrectly decoded. 
Express your bound in terms of the energy per bit. Compare with the exact ex- 
pression in uncoded communication. 

Exercise 29.4 (The Repetition Code). Consider the linear (1,N) F2 repetition code 

consisting of the all-zero and all-one N-tuples (0, . . . , 0) and (1, . . . , 1). 

(i) Find its weight enumerator. 

(ii) Find an optimal decoder for a system employing this code with the componentwise 
antipodal mapping Tn (•) over the white Gaussian noise channel in conjunction 
with PAM with a pulse shape whose times shifts by integer multiples of the baud 
period are orthonormal. 

(iii) Find the optimal probability of error. Express your answer using the energy per 
bit Eb- Compare with uncoded antipodal signaling. 

(iv) Describe the hard decision rule for this setup. Find its performance in terms of Eb- 

Exercise 29.5 (The Dual Code). We say that two binary ^-tuples u = (iti, . . . ,u K ) and 
v = (vi , . . . ,v K ) are orthogonal if 

Ui • Vi (B Ui • Vi ® • • • © U K • v K — 0. 

Consider the set of all N-tuples that are orthogonal to every codeword of some given 
linear (K, N) F 2 code. Show that this set is a linear (N — K, N) F 2 code. This code is 
called the dual code. What is the dual code of the (K, K + 1) single parity check code? 
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Exercise 29.6 (Hadamard Code). For a positive integer N which is a power of two, define 
the N x N binary matrix Hn recursively as 

h -(S !)■ Hn = (h:;; £:;:)- ^m,*,..., (29.100) 

where H denotes the componentwise negation of the matrix H, that is, the matrix whose 
Row-j Column-i? element is given by 1® [H]j^, where [H]j,e is the Row-j Column-^ element 
of H. Consider the set of all rows of Hn . 

(i) Show that this collection of N binary N -tuples forms a linear (log 2 N, N) F2 code. 
This code is called the Hadamard code. Find this code's weight enumerator. 

(ii) Suppose that, as in Section 29.6, this code is used in conjunction with PAM over 
the white Gaussian noise channel and that Y\ , . . . , Yn are as defined there. Show 
that the following rule minimizes the probability of a message error: compute the 
vector 

( Yl \ 
H N (29.101) 

VW 

and guess that the m-th message was sent if the m-th component of this vector is 
largest. Here Hn is the N x N matrix whose Row-j Column-^ entry is the result 
of applying T(-) to the Row-j Column-^ entry of Hn • 

(iii) A brute-force computation of the vector in (29.101) requires N 2 additions, which 
translates to N 2 /log 2 N additions per information bit. Use the structure of Hn 
that is given in (29.100) to show that this can be done with N log 2 N additions 
(or N additions per information bit). 

Hint: For Part (iii) provide an algorithm for which c(N) = 2c(N/2) + N, where c(n) 
denotes the number of additions needed to compute this vector when the matrix is n x n. 
Show that the solution to this recursion for c(2) — 2 is c(n) — n\og 2 n. 

Exercise 29.7 (Bi-Orthogonal Code). Referring to the notation introduced in Exer- 
cise 29.6, consider the 2N x N matrix 

Hn 
Hn 

where N is some positive power of two. 

(i) Show that the rows of this matrix form a linear (log 2 (2N), NJ F 2 code. 

(ii) Compute the code's weight enumerator. 

(iii) Explain why we chose the title "Bi-Orthogonal Code" for this exercise, 

(iv) Find an efficient decoding algorithm for the setup of Section 29.6. 

Exercise 29.8 (Non-IID Data). How would you modify the decision rule of Section 29.8 if 
the data bits (Di, . . . , D K ) are not necessarily IID but have the general joint probability 
mass function Pd(-)? 

Exercise 29.9 (Asymmetric Channels). Show that Theorem 29.9.3 will no longer hold if 
we drop the hypothesis that the channel is symmetric. 
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Exercise 29.10 (A Proof of the Singleton Bound). Use the following steps to prove the 
Singleton Bound. 

(i) Consider a linear (K, N) F2 code. Let tz : F™ — > F^ -1 map each N-tuple to the 
(K — l)-tuple consisting of its first K — 1 components. By comparing the number of 
codewords with the cardinality of the range of it, argue that there must exist two 
codewords whose first K — 1 components are identical. 

(ii) Show that these two codewords are at Hamming distance of at most N — K + 1. 

(iii) Show that the minimum Hamming distance of the code is at most N — K + 1. 

(iv) Does linearity play a role in the proof? 

Exercise 29.11 (Binary MDS Codes). Codes that satisfy the Singleton Bound with equal- 
ity are called Maximum Distance Separable (MDS). Show that the linear (K, K + 1) F2 
single parity check code is MDS. Can you think of other binary MDS codes? 

Exercise 29.12 (Existence via the Varshamov Bound). Can the existence of a linear (4, 7) 
F2 code of minimum Hamming distance 3 be deduced from the Varshamov Bound? 



Appendix A 

On the Fourier Series 

A.l Introduction and Preliminaries 

We survey here some of the results on the Fourier Series that are used in the book. 
The Fourier Series has numerous other applications that we do not touch upon. 
For those we refer the reader to (Katznelson, 1976), (Dym and McKean, 1972), 
and (Korner, 1988). 

To simplify typography, we denote the half-open interval [—1/2, 1/2) by I: 



1 -<0< 

2 ~ 2 



i}. (A.., 



Definition A. 1.1 (Fourier Series Coefficient). The rj-th Fourier Series Coef- 
ficient of an integrable function g: I — > C is denoted by g{rf) and is defined for 
every integer n by 

g(n) 4 L{9) er'^e d0 _ (A 2) 

Ji 

The periodic extension of the function g : I — > C is denoted by gp : R — > C and 
is defined as 

gp{n + 0)=g{9), (n e Z, 9 € l). (A. 3) 

We say that g : I — > C is periodically continuous if its periodic extension gp is 
continuous, i.e., if (?(•) is continuous in I and if, additionally, 

lim g{9) = g{-l/2). (A.4) 

0fl/2 

A degree-n trigonometric polynomial is a function of the form 

n 

0h-> J2 a v e a ^ e , 0eR, (A.5) 

rj— — n 

where a n and a_„ are not both zero. Note that if p(-) is a trigonometric polynomial, 
then p(9 + 1) = p(9) for all 6 eR. 

686 



A.l Introduction and Preliminaries 687 

If g: I — » C is integrable, and if p(-) is a trigonometric polynomial, then we define 
the convolution g * p at every 9 € M. as 



(g*p)(0)= g{#)p{e -■»)&■& (A.6) 

i 

p(t?)0 P (0-i?)dt?. (A.7) 



Lemma A. 1.2 (Convolution with a Trigonometric Polynomial). TTie convolution 
of an integrable function g : I — > C wiift ifte trigonometric polynomial 

n 

6^ Y, a v e ' 27Tv9 ( A - 8 ) 

r/— — n 

is the trigonometric polynomial 

n 

0^ Yl g(v) a v e^ r ' e , 6eR. (A.9) 

'q——n 

Proof. Denote the trigonometric polynomial in (A. 8) by p{). By swapping sum- 
mation and integration we obtain 



(g*p)(0)= g(0)p(e-0)dd 

Jl 

~ n 

= g(0) V a v e>^(e-*) M 

n ,. 

= ]T #)s ei2l,M)d '' 

■ n =-n Jl 

n ,. 

= V a v e i2m > / g(#) e -'< 2 ^ d tf 

■ n =-n Jl 

n 

= E a v e'^' e g(n), 8eR. D 

77— — n 

Definition A. 1.3 (Fejer's Kernel). Fejer's degree-n kernel k„ is i/ie trigono- 
metric polynomial 

k n {6)= J2 (l " ^7) ^^ (A-lOa) 



7] — — n 

n+1 tf9eZ, 

1 / sin((n + lM) \ 2 • / ^ MVZ _ 

n+1 \ sm(7rp) / J \ 



(A.lOb) 



The key properties of Fejer's kernel are that it is nonnegative 

fc„(0)>O, 9et; (A. 11a) 
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that it integrates over I to one 

/ k n {6)d8 = 1; (A. lib) 

and that for every fixed < 5 < 1/2 

lim / k n {0)dO = O. (A. lie) 

n ^°°Js<\e\<± 

Here (A. 11a) follows from (A. 10b); (A. lib) follows from (A. 10a) by term-by-term 
integration over I; and (A. lie) follows from the inequality 

1 / 1 \ 2 „ ,„, 1 



k n (0) <—-[-—) , 5<\e\< 

n + 1 \sin7rd/ 2 

which follows from (A. 10b) by upper-bounding the numerator by 1 and by using 
the monotonicity of sin (tt8) in \6\ G [0, 1/2]. 

For an integrable function g: I — > C, we define for every n € N and Set 

<r„(g,0) = (g*kn)(0) (A.12) 

r/— — n ^ ' 

where the second equality follows from (A. 10a) and Lemma A. 1.2. We also define 

for g : E -> C or g : I -> C 

l|g|li.i= /|5W|d0. (A.14) 

Finally, for every function h : I — > C and t3e Iwe define the mapping h^ : R — > C 
as 

h tf : 6^hp(9 -i?). (A. 15) 

A. 2 Reconstruction in £; 

Lemma A. 2.1. // g: R — » C is integrable over I and g{6 + 1) = g{6) for every 

6 eR, then 

lim /|#(0) - 5(6» -<&)\de = 0. (A. 16) 

Proof. This is easy to see if g is continuous, because in this case g is uniformly 
continuous. The general result follows from this case by picking a periodic con- 
tinuous function h that approximates g in the sense that ||g — h||j 1 < e/2; by 
computing 

||g-g#|| M = ||g-h + h-g,,|| M 

= ||g - h + h - h# + h# - g#\\ Itl 

< llg-hlli.j + llh-h^H^ + llhtf-gtfll^ 
= ||g-h|| u + ||h-h^|| M + ||h-g|| M 

,i ; (A.iT) 



A. 2 Reconstruction in Ci 



689 



and by then applying the result to h, which is continuous. □ 

Theorem A. 2. 2 (Reconstruction in Ci). If g: I — > C is integrable, then 

lim [\g(9)-a n (g,9)\d9 = 0. (A.18) 

Proof. Let gp be the periodic extension of g. Then for every 5 G (0, 1/2), 



\g(0) -CT„(g, 
Ji 



9p(0)- gp(0-$)k n ($)M 

Ji 

k n (i9)(gp(9)-g P (0-i9))di9 



d9. 



d9, (A.19) 



kn(#)(gp(0)-gp(6 -■&))&■& 

1-5 Js<\d\<i 

where the first equality follows from the definition of c„(g, 9) (A. 12), and where the 
second equality follows from (A. lib). We now bound the two integrals in (A.19) 
separately: 

r 6 

*n(0)M0)-flp(0-0))d<? 



de 



< I I k n (i9)\g P (6)-g P (6-i9)\d$d6 
nJ-s 

k n {#)\gp{9)-g P {9-#)\d9d# 

~<s Ji 

knW f\gp(0)-g P (e-i9)\d6di9 

'S Ji 

< [ k n {d) max { [\gp{6)- gp{9-ti r )\d0} dti 
J-s l#'l<<5 {Ji 



(A.20) 



< / k n {d) max <^ / \gp(9) - g P {6 - #')\d9 } dtf 

\0'\<s {Ji 



max l\gp(9)-gp(0-#)\de, 



(A.21) 



where the first inequality follows from the Triangle Inequality for Integrals (Propo- 
sition 2.4.1) and the nonnegativity of k n {-) (A. 11a), and where the last equality 
follows because k n {-) integrates to one (A. lib). 

The second integral in (A.19) is bounded as follows: 

k n (V)\gp(8)-gp(8--d)\d'dd8 



I-/5<|0|<i 



*„(<?) \gp{8) - gp{9 - $)\d9 dd 

<5<|)9|<A Ji 

< maxj f\gp{9) - g P {8 - $ r )\d9\ [ k n {$)di} 

§,& {Ji ) Js<\$\<± 



<2||g|| M / kn(0)iW. 

Js<\#\<± 



(A.22) 
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From (A. 19), (A. 21), and (A.22) we obtain 



g[9) - a n (g,e)\de < max / \g P (0 - i9) - gp{0)\dd 



2||g||u / k n (#)d#. (A.23) 

J5<\0\<i 



's<\#\<i 

Inequality (A.23) establishes the theorem as follows. For every e > we can find 
by Lemma A. 2.1 some 5 > such that 



max / \g P {6 - t?) - g P {0)\d9 < e, (A. 24) 

and keeping this 5 > fixed we have by (A. lie) 

lim 2||g|| I:r / k n (tf)di9 = 0. (A.25) 

It thus follows from (A.23), (A. 24), and (A.25) that 



lim / \g(0) - o n (s,0) d6<e, (A.26) 

from which the theorem follows because e > was arbitrary. □ 

From Theorem A. 2. 2 we obtain: 

Theorem A. 2. 3 (Uniqueness Theorem). Let gi,g2: I — * C be integrable. If 

9i(v)=92(v), iieZ, (A.27) 

then g! and g 2 are equal except on a set of Lebesgue measure zero. 

Proof. Let g = gi - g 2 . By (A.27) 

g( V ) = 0, r, e Z, (A.28) 

and consequently, by (A. 13), o~ n (g,6) = for every n € N and # G I. By Theo- 
rem A. 2. 2 



lim \g(e)-a n (g,e)\d0 = 0, (A.29) 

which combines with (A.28) to establish that 

\g(6)\d6 = 0. 



Thus, g is zero except on a set of Lebesgue measure zero (Proposition 2.5.3 (i)), 
and the result follows by recalling that g = gi — g2- D 

Theorem A. 2. 4 (Riemann-Lebesgue Lemma). If g: I — > C is integrable, then 

lim g(rj) = 0. (A. 30) 

M-»oo 
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Proof. Given any e > 0, let p be a degree-n trigonometric polynomial satisfying 

\g(9)-p(6)\de<e. (A.31) 



(Such a trigonometric polynomial exists for some n € N by Theorem A. 2. 2). Ex- 
pressing g as (g — p) + p and using the linearity of the Fourier Series Coefficients 
we obtain for every integer r\ whose magnitude exceeds the degree n of p 



\g(v)\ 



< 



< e, 



(g - p)(v) + p{v) 
{g-p)(v) 

g(9)-p(8)\de 



(A.32) 



where the equality in the first line follows from the linearity of the Fourier Series 
Coefficient; the equality in the second line because \r)\ is larger than the degree n 
of p; the inequality in the third line because for every integrable h : I — > C we have 



\h{rj)\ 



h{6) e-'' 2 ^ 9 d6 



< 



h{ff)e 



-\2ixr]B 



d(9 



\h{0)\d6, V EZ; 



and where the inequality in the last line of (A.32) follows from (A.31) 



□ 



A. 3 Geometric Considerations 

Every square-integrable function that is zero outside the interval [—1/2, 1/2] is also 
integrable (Proposition 3.4.3). For such functions we can discuss the inner product 
and some of the related geometry. The main result is the following. 

Theorem A. 3.1 (Complete Orthonormal System). The bi-infinite sequence of 
functions . . . , <p-i, <f>o, <f>i, ■ ■ ■ defined for every r\ G Z by 

(j, v (0) = e i27rr ' e l{eel}, deR 

forms a complete orthonormal system for the subspace of C2 consisting of those 
energy-limited functions that are zero outside the interval I. 



Proof. The orthonormality follows by direct calculation 



/' 



(A.33) 



To show completeness it suffices by Proposition 8.5.5 (ii) to show that a square- 
integrable function g : I — > C that satisfies 



(g,0r,)=O, ijez 



(A.34) 
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must be equal to the all-zero function except on a subset of I of Lebesgue measure 
zero. To show this, we note that 

(&,4>v)=§(v), 16Z, (A.35) 

so (A. 34) is equivalent to 

g(n) = 0, ijeZ, (A. 36) 

and hence, by Theorem A. 2. 3, g must be zero except on a set of Lebesgue measure 
zero. □ 

Recalling Definition 8.2.1 and Proposition 8.2.2 (d) we obtain that, because the 
functions . . . , </>_i, <fio, (pi, . . . form a CONS and because (g, <p n ) = g(rj), we have: 

Theorem A. 3. 2. Let g,h: I — > C be square integrable. Then 

/OO 
\g(e)\ 2 de= Y, \g(v)\ 2 (A.37) 

7/— —00 

and 

/oo 
g(9)h*(0)M = Y, 9(v)h*(v). (A.38) 

r,=-oo 

There is nothing special about the interval I, and, indeed, by scaling we obtain: 
Theorem A. 3. 3. Let S be nonnegative. 

(i) The bi-infinite sequence of functions defined for every r\ s Z by 

s ^_^ e i2. W s T |_S< s< Sj ; seR (A39) 

forms a CONS for the class of square-integrable functions that are zero out- 
side the interval [— S/2,S/2). 



(ii) Lf g is square integrable and zero outside the interval [—S/2, S/2), then 

»S/2 oo eS/2 -, 2 



S/2 ,,=-00 J- 



c In 

-S/2 VS 



(A.40) 



(iii) J/g, h: M. — > C are square integrable and zero outside the interval [— S/2, S/2), 
then 

S/2 
-S/2 

?(/I» ( ^ e " B " WSd{ )(/I k «^ e " iWSd£ 



r;— — oo 
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Note A. 3. 4. The theorem continues to hold if we replace the half-open interval 
with the open interval (— S/2, S/2) or with the closed interval [— S/2, S/2], because 
the integrals are insensitive to these replacements. 

Note A. 3. 5. We refer to 



S/2 



9(0 



-S/2 



1 

71' 



-i27rr/s/S 



d£ 



as the 77-th Fourier Series Coefficient of g with respect to the interval 
[-S/2.S/2). 

Lemma A. 3. 6 (A Mini Parseval Theorem). 

(i) If 



where 



then 



x(t)-- 

satisfies 



w 



g(f)e'^df, te 



w 



w 



| 5 (/)| 2 d/<oo, 



w 



oo pW 

\x(t)\ 2 dt= \g(f)\ 2 df. 

-oo J-W 



(ii) If for both v = 1 and v = 2 



x v (t) 



w 



<?,(/) e'^'d/, te 



w 



where the functions gi,g2^ K — > C satisfy 



-w 



|0„(/)f d/ < oo, i/ = l,2, 



(A.41) 



(A.42) 



(A.43) 



(A.44) 



(A.45) 



then 



w 



(A.46) 



/OO 
»iW»5(t)'it= / 9i(f)gUf)df. 
-oo J-W 

Proof. We first prove Part (i). We begin by expressing the energy in x in the form 

\x{t)\ 2 dt = Y. / k(*)l 2 d* 



^ — — OO 2W 

OO 



E 



E 



— ) 

2W/ 



2W/ 



do 



da, 



(A.47) 
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where in the second equality we changed the integration variable to a = t + £/(2W); 
and where the third equality follows from Fubini's Theorem and the nonnegativity 
of the integrand. The proof of Part (i) will follow from (A. 47) once we show that 
for every a£l 



E 



-) 



w 
2VV / | 5 (/)| 2 d/. 
-w 



(A.48) 



This can be shown by noting that by (A.41) 



i / e \ 

/2W ^ 2W7 



w 



-w 



/2W 



" i27r /2W p' ,27r fa 



™e^ tt 9 (/)d/, 



so (2W) 1 ' 2 x(a — £/(2W)) is the £-th Fourier Series Coefficient of the mapping 
/ h- > e' 27r f a g(f) with respect to the interval [— W, W) and consequently 



E 

£=-oo 



1 / £ \ 

/2W V 2W/ 



w 

-w 

w 



,i27r/a 



<K/) 



d/ 



ls(/)rd/, 



w 



where the first equality follows from Theorem A. 3. 3 (ii) and the second because 
the magnitude of e x27T * a is one. 

To prove Part (ii) we note that by opening the square and then applying Part (i) 
to the function /3xi + x 2 we obtain for every (3 € C 



xi(£)| dt+ / |aj 2 (t)| dt + 2Re\f3 xi(t) x* 2 {t) dt 

J— oo \ J— CO 

|/3a;i(i)+a;2(t)| 2 dt 



w 



|/?Si(/) + <&(/) Id/ 



w 



5i(/) d/+ / 52 (/)rd/ + 2Re [0 / <?i(/)g 2 *(/)d/ 



Consequently upon applying Part (i) to xi and to x 2 we obtain 



w 

Re(/3/ a;i(t)a:5(t)dt) = Re(/?/ <?i (/) g* 2 (/) d/ ) , 0eC, 

i-w 



which implies 



x\(i) x 2 (t) dt 



w 



9i(f)9*2(f)df- 



□ 



w 
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Corollary A. 3. 7. 

(i) Let y : M. — » C be of finite energy, and let T > be arbitrary. Let 

9(f) = J y(t)e-' ,27rft dt, feR. 

Then 

/T f'OQ 

\y(t)fdt= / |5(/)| 2 d/. 
-T J-oo 

(ii) Let the signals Xi,X2: R — > C be of finite energy, and let T> 0. Define 
g v {f)= f x v (t)e- i2lt '*dt, (^=1,2, /£l). 



Then 

fT 



x 1 (t)x* 2 (t)dt= gi(f)g* 2 (f)df. 

J — oo 

Proof. Part (i) follows from Lemma A. 3. 6 (i) by substituting T for W; by substi- 
tuting y for g; and by swapping the dummy variables / and t. Part (ii) follows 
analogously. □ 

A. 4 Pointwise Reconstruction 

If g : I — > C is periodically continuous, then we can reconstruct its value at every 
point from its Fourier Series Coefficients: 

Theorem A. 4.1 (Reconstructing Periodically Continuous Functions). Let the 

function g : I — » C be periodically continuous. Then 

lim max{ \g{9) - a„(g, 9) \ } = 0. (A.49) 

Proof. Let gp denote the periodic extension of g. Then for every 9 G I, 
g{9) - (7„(g, 9) = g{9) - [ k n {d) g P (9 - 0) d0 

knWfaW) ~ 9p(0 ~ #)) M, ( A - 5 °) 

where the first equality follows from the definition of cr„(g, 9) (A. 12) and the second 
from (A. lib). Consequently, for every 6 £ I, 



\g(9)-a n (g,9)\ 

< fk n ^)\g P (9)-g P (9- 1 !))\d'd 



k n (#)\gp(9)-gp(0-0)\d0, < 6 < \. (A.51) 

5<|i5|<| l 
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We next treat the two integrals separately. For the first we have for every 9 £ I 
and every < S < 1/2, 



k n ('d)\gp(9)-g P (e-{})\M<m^{\gp(9)-gp(9-r)\} f fc„(0) dtf 

\#'\<s J-S 



<max{\g P (9)-g P (9-$)\}, (A.52) 



where the first inequality follows from the the nonnegativity of fc„(-) (A. 11a), and 
where the second inequality follows because fc„(-) is nonnegative and integrates 
over I to one (A. lib). For the second integral in (A. 51) we have for every 9 £ I 
and every < 5 < 1/2, 

/ k n (ti)\gp(9)-g P (9-i9)\d{><2m a x{\g(9')\} f k n (i9)di9, (A.53) 

JS<\-8\<± ' & J6<\$\<\ 

where the maximum on the RHS is finite because g is periodically continuous. 
Combining (A. 51), (A.52), and (A.53) we obtain for every < 5 < 1/2 

max{|<7(0)-a n (g,0)|} 

< maxmax{| 9p ((9)-5p((9-tf)|} + 2 max{ |g(6»')| } / fc„(tf)dt9. (A. 54) 

eel \#\<6 e> ei ~'<5<|i?|<i 

Because g(-) is periodically continuous it follows that its periodic extension gp is 
uniformly continuous. Consequently, for every e > we can find some 5 > such 
that 

max| 5 p(6») -5p(0-i9)| < e, 9 £ I. (A. 55) 

By letting n tend to infinity in (A. 54) we obtain from (A. lie) and (A. 55) 

lim max{ \g{9) - <r n (g, 0)1} < e, 

which establishes the result because e > was arbitrary. □ 

As a corollary we obtain: 

Corollary A. 4. 2 (Weierstrass's Approximation Theorem). Every periodically con- 
tinuous function from I to C can be approximated uniformly using trigonometric 
polynomials. 
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Theorems Referenced by Name 



Bernstein's Inequality 

Bochner's Theorem 

Cauchy-Schwarz Inequality 

Cauchy-Schwarz Inequality for Random Variables 

Characterization of Shift-Orthonormal Pulses 

Covariance Inequality 

Dominated Convergence Theorem 

Factorization Theorem 

Fubini's Theorem 

Holder's Inequality 

Kolmogorov's Existence Theorem 

/^-Sampling Theorem 

Minimum Bandwidth Theorem 

Nyquist's Criterion 

Parseval's Theorem 

Pointwise Sampling Theorem 

Pythagorean Theorem 

Riesz-Fischer Theorem 

Sandwich Theorem 

Triangle Inequality for Complex Numbers 

Triangle Inequality in C2 

Union-of- Events Bound (or Union Bound) 

Wiener-Khinchin Theorem 



Theorem 6.7.1 

Theorem 25.8.1 

Theorem 3.3.1 

Theorem 3.5.1 

Corollary 11.3.4 

Corollary 3.5.2 

(Rudin, 1974, Theorem 1.34) 

Theorem 22.3.1 

See Section 2.6 

Theorem 3.3.2 

Theorem 25.2.1 

Theorem 8.4.3 

Corollary 11.3.5 

Theorem 11.3.2 

Theorem 6.2.9 

Theorem 8.4.5 

Theorem 4.5.2 

Theorem 8.5.3 

Chapter 8, Footnote 5 

(2.11) and (2.12) 

(4.12) and (4.14) 
Theorem 21.5.1 
Theorem 25.14.1 
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Abbreviations 



Abbreviations in Mathematics 

CDF Cumulative Distribution Function 

CONS Complete Orthonormal System 

CRV Complex Random Variable 

CSP Complex Stochastic Process 

FDD Finite-Dimensional Distribution 

FT Fourier Transform 

IFT Inverse Fourier Transform 

IID Independent and Identically Distributed 

LHS Left-Hand Side 

MGF Moment Generating Function 

PDF Probability Density Function 

PMF Probability Mass Function 

PSD Power Spectral Density 

RHS Right-Hand Side 

RV Random Variable 

SP Stochastic Process 

WSS Wide-Sense Stationary 

Abbreviations in Communications 

BER Bit Error Rate 

BPF Bandpass Filter 

LPF Lowpass Filter 

M-PSK M-ary Phase Shift Keying 

PAM Pulse Amplitude Modulation 

PSK Phase Shift Keying 

QAM Quadrature Amplitude Modulation 

QPSK Quadrature Phase Keying 
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List of Symbols 



General 

A^> B Statement B is true whenever Statement A is true. 

A 43- B Statement A is true if, and only if, Statement B is true. 

"^2 Summation. 

|j Product. 

= Equal by definition. 

□ End of proof. 



Sets 

Empty set. 

{— : — } The set of all objects described before the colon that satisfy 

the condition stated after the colon. 

# A Number of elements of the set A. 

a G A Set membership: a is an element of A. 

a ^ A Exclusion: a is not an element of A. 

A C B Proper subset: every element of A is an element of B but some 

elements of B are not elements of A. 

A C B Subset: every element of A is also an element of B. 

B\A Setminus: {b G B : b <£ A}. 

A c Set-complement. 

AAB Symmetric Set Difference: (A \ B) U (B \ A) . 

Ax B Cartesian product: \{a,b) : a € A,b G £?}. 

A n n-fold Cartesian product: A x A x • • • x A. 



AC\B Intersection: {£ e A : f G B}. 

A U B Union: elements of A or B. 



Specific Sets 

N Natural Numbers: {1, 2, . . .}. 

Z Integers: {...,-2,-1,0,1,2,...}. 

M. Real Numbers. 

C Complex Numbers. 
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n times 



List of Symbols 705 

F 2 Binary Field (Section 29.2). 

I Unit interval [-1/2, 1/2); see (A.l). 

Intervals and Some Functions 

<, <; >, > Inequality signs. 

+oo, — oo, oo Infinities. 

[a, b] Closed Interval: {£ € R : a < £ < b}. 

[a, b) Interval open on the right: {£ € R : a < £ < b}. 

(a, b] Interval open on the left: {£ G K : a < £ < b}. 

(a, b) Open interval: {£ £ R : a < £ < b}. 

[0, oo] Nonnegative reals including infinity: {( 6 1 : ( > 0} U {°°}- 

[£J Floor: the largest integer not larger than £. 

[£] Ceiling: the smallest integer not smaller than £. 

max Maximum. 

min Minimum. 

sup Least upper bound. 

inf Greatest lower bound. 



Complex Numbers 

C Complex field. 



Ke(z) Real part of z. 

Im(-) Imaginary part of z. 

\z\ Modulus of z. 

z* Complex conjugate of z. 

V(z ,r) Open disc: {z s C : \z — zo\ < r}. 

Limits 

a n — > a Convergence: the sequence ai, 02, . . . converges to a. 

linin^oo a n Limit: the limit of a„ as n tends to infinity. 

— » Converges to. 

linin^oo a n Upper limit (limit superior). 

lim n ^ 00 a„ Lower limit (limit inferior) . 

Defining and Operating on Functions 

g: T> — > TZ Function of name g, domain T>, and range TZ. 

g: t 1— » £ 2 Function of name g mapping t to t 2 . (Domain & range un- 
specified.) 

goh Composition: £ 1— > g(h(£)). 

d Differentiation operator. 

~ g ' ( X ) Partial derivative of g(-) with respect to x^K 
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"3( x ) Jacobian matrix. 

9x 

j v Integral over the region T>. 

f(0\c— a The evaluation of the function £ i— > g(£) at a. 

g(£)\ The evaluation </(&) — g(a). 

Function Norms, Relations, and Equivalence Classes 

llxH, See (2.6). 

||x|| g See (3.12). 

||x|| M See (A.14). 

x = y x and y are indistinguishable; see Definition 2.5.2. 

[u] The equivalence class of x; see (4.60). 

Function Spaces 

Ci Integrable functions from 1 to C or M to t (depending on 

context); see Sections 2.2 and 2.3. 
C2 Square-integrable functions from K to C or R to R (depending 

on context); see Section 3.1. 
L2 Collection of equivalence classes of square-integrable functions; 

see Section 4.7. 

Special Functions 

Ijstatement} Indicator function. Its value is 1 if the statement is true and 

otherwise. 

All-zero function: t <— * 0. 

n! n factorial: 1 x 2 x • • • x n. 

(?) Number of subsets of {l,...,n} containing k (distinct) ele- 

ments (= n\/(k\(n — fc)!)). 

V 7 ^ Nonnegative square root of £. 

cos(-) Cosine function (argument in radians). 

sin(-) Sine function (argument in radians). 

sinc(-) Sine function; see (5.20). 

tan _1 (-) Inverse tangent. 

Q(-) Q-function; see (19.9). 

r(-) Gamma function; see (19.39). 

Io(-) The zeroth-order modified Bessel function; see (27.47). 

ln(-) Natural logarithm (base e). 

exp(-) Exponential function: exp(£) = e*. 

^ mod [— 7r, n) element of [— 7r,7r) that differs from £ by an integer multiple 
of 2tt. 
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Operations on Signals 

x The mirror image of x; see (5.1). 

x The Fourier Transform of the signal x; see (6.1). 

x Inverse Fourier Transform of x; see (6.4). 

(x, y) Inner product between the signals x and y; see (3.1) and (3.4). 

x*y Convolution of x with y; see (5.2). 

x + y The signal 1 1-> x(t) + y(t). 

ax The scaling of the signal x by complex or real number a, i.e., 

the signal 1 1— > ax(t). 

Rxx Self-similarity function of signal x. 

g(rj) The 77-th Fourier Series Coefficient; see (A. 2). 



Filters 

LPFw c (0 Frequency response of a unit- gain lowpass filter of cutoff fre- 

quency W c . That is, LPF Wc (/) = I{|/| < W c }. 

LPFw c (') Impulse response of a unit-gain lowpass filter of cutoff fre- 

quency W c . That is, LPF Wc (i) = 2W c sinc(2W c i). 

BPFw f (•) Frequency response of a unit-gain bandpass filter of band- 

width W around the carrier frequency f c . That is, the mapping 
of / to l{||/| - / c | < W/2}. It is assumed that f c > W/2. 

BPFvv,/ c ( - ) Impulse response of a unit-gain bandpass filter of band- 

width W around the carrier frequency f c . That is, the mapping 
of t to 2Wcos(2tt/ c £) sinc(Wi). It is assumed that f c > W/2. 



PAM Signaling 

g or Pulse shape; see Section 10.7. 

T s Baud period; see Section 10.7. 

1/T S Baud rate. 

X Constellation; see Section 10.8. 

S Minimum distance of a constellation; see Section 10.8. 

enc(-) Block encoder; see Definition 10.4.1 and (18.3). 

x(t;d) Transmitted signal at time t when the data are d; see (28.6). 
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QAM Signaling 



g or 4> 

T s 

1/T S 

C 

5 

enc(-) 

x(t;d) 



Pulse shape; see Sections 16.3 & 16.5. 

Baud period; see Section 16.3. 

Baud rate. 

Constellation; see Section 16.7. 

Minimum distance of a constellation; see Section 16.7. 

Block encoder; see (18.3). 

The transmitted signal at time t when the data are d; see 

(28.31). 



Matrices 

n x m matrix 


\n 
(M) 

A* 

A T 

At 

tr(A) 

det(A) 

Re(A) 

Im(A) 

A^ 

A>-0 



A matrix with n rows and m columns. 

The all-zero matrix. 

The n x n identity matrix. 

The Row-fc Column-^ component of the matrix A. 

Componentwise complex conjugate. 

Transpose of A. 

Hermitian conjugate of A. 

Trace of A. 

Determinant of A. 

Componentwise real part of A. 

Componentwise imaginary part of A. 

A is a positive semidefinite matrix. 

A is a positive definite matrix. 



Vectors 



o 

a T 

IN 

(a,b) E 
d E (a,b) 



Set of column vectors of n real components. 

Set of column vectors of n complex components. 

The all-zero vector. 

The j'-component of the column vector a. 

The transpose of the vector a. 

Euclidean norm of a; see (20.85). 

Euclidean inner product; see (20.84). 

Euclidean distance between a and b, i.e., ||a — b 



Linear Algebra 

span(vi, . . . , v 
Dim(V) 
Ker(T) 
Image (T) 



Linear subspace spanned by the n-tuple (vi, 
Dimension of the subspace V. 
Kernel of the linear mapping T(-). 
Image of the linear mapping T(-). 



); see (4.8). 
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Probability 



Px(-) 
Px.y{-r) 
Px\y(-\-) 
fx(-) 

fx\Y{-\-) 
fx\A 
Fx(-) 
*x(0 

Mx(-) 
E[X] 
Var[X] 
Cov[X,F] 

EM-] 

Pr(-) 
Pr(.|-) 
Pr[-] 
Pr[-|.] 

X^-Y^-Z 

{Xk} 

X ~ Distribution 

Bernoulli (p) 

U{A) 
AA c (0, K) 

A^,K) 



Probability triplet; see Page 3. 

Probability Mass Function (PMF) of X. 

Joint PMF oi{X,Y). 

Conditional PMF of X given Y. 

Probability density function of X. 

Joint PDF oi(X,Y). 

Conditional PDF of X given Y. 

Conditional PDF of X given the event A. 

Cumulative distribution function of X . 

Characteristic function of X. 

Moment generating function of X; see (19.23). 

Expectation of X; see (17.9). 

Variance of X; see (17.14a). 

Covariance between X and Y; see (17.17). 

Conditional expectation. 

Probability of an event. 

Conditional probability of an event. 

Probability that a RV satisfies some condition. 

Conditional version of Pr[-]. 

Equal in law. 

X and Z are conditionally independent given Y. 

Sequence of random variables . . . , X_i, X , Xi, . . . 

X has the specified distribution. 

Noncentral \ 2 distribution with n degrees of freedom 

and noncentrality parameter A. 

Bernoulli distribution (takes on the values and 1 with 

probabilities p and 1 — p). 

Uniform distribution over the set A. 

Multivariate proper complex Gaussian distribution of 

covariance K; see Note 24.3.13. 

Multivariate real Gaussian distribution of mean fj, and 

covariance K. 



Stochastic Processes 



(X(n)), {X n , neZ) 

(x(t)), (x(t), teR 

Kxx 

Pxx(-) 



Discrete-time stochastic process. 
Continuous-time stochastic process. 
Autocovariance function. 
Power spectral density (PSD). 
Correlation function. 
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Hypothesis Testing 

B m , m / The subset of R d denned in (21.33). 

H RV to be guessed in binary hypothesis testing. 

LLR(-) Log likelihood-ratio function; see (20.41). 

LR(-) Likelihood-ratio function; see (20.38). 

M Number of hypotheses in multi-hypothesis testing. 

M Set of hypotheses {1, ..., M}. 

M RV to be guessed in multi-hypothesis testing. 

</>G U0SS Generic guessing rule; see Sections 20.2 & 21.2. 

</>q uoss Generic optimal guessing rule. 

</>map MAP Decision Rule. 

</>ml Maximum-Likelihood Rule. 

p* (error) Optimal probability of error. 

PMAp(errorj-) Conditional probability of error of MAP rule. 



The Binary Field and Binary Tuples 

F 2 Binary field {0,1}. 

F£ The set of binary K-tuples. 

Addition in F 2 ; see (29.3). 

Multiplication in F 2 ; see (29.4). 

dfj(u,v) Hamming distance; see Section 29.2.4. 

wh(u) Hamming weight; see Section 29.2.4. 

Y and T^ Antipodal mappings (29.14) and (29.17). 

Coding 

A Kt o Binary N-tuples whose K-th component is zero; see (29.61). 

A K ,i Binary N-tuples whose K-th component is one; see (29.64). 

c Generic element of Image(T). 

dmin.H Minimum Hamming distance; see (29.54). 

enc Encoder. 

p* R Optimal probability of error in guessing the K-th data bit. 

PMAp(error|D = d) Conditional probability of error of the MAP rule designed 

to minimize block errors. 

V-d(-) See (29.77). 

x Generic element of Image(enc). 

x^(d) The 7/-th symbol in the N-tuple enc(d). 
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absolute value, 2, 16 
affine transformation 

of a multivariate Gaussian, 473 

of a scalar, 341 

of a univariate Gaussian, 341 

of a vector, 473 
all-zero 

function, 3 

matrix, 456 

signal, 3 
all-zero codeword assumption, 675-680 
almost sure convergence 

of random variables, 356 

of random vectors, 487 
amplification, 3, 27 
analytic continuation, 350, 351n 
analytic function, 60 
analytic representation, 109, 135 

of an energy-limited signal, 135 
characterization, 135 
definition, 135 

of an integrable signal, 109-116 
characterization, 110 
definition, 110 
inner products, 114 
recovering from, 113 
analytic signal, see analytic representation 
antipodal mapping, 653, 656 
argument, 65n 

Arithmetic-Geometric Inequality, 421 
assuming the all-zero codeword, 675-680 
autocorrelation function, 211, see also self- 
similarity function 
autocovariance function 

of a continuous-time SP, 517 

of a discrete-time CSP, 300 

of a discrete-time SP, 211 
average probability of a bit error, 637 



BSlcskei, Helmut, xxiv 
band-edge symmetry, 193 
bandlimited stochastic process, 252 
bandpass filter, 61, see also ideal unit-gain 

bandpass filter 
bandwidth, 680 

around a carrier, 101, 104 
of a product, 90-92 
of a stochastic process, 252 
of baseband representation 
energy-limited signal, 137 
integrable signal, 122 
of energy-limited signal, 81 
of integrable signal, 89 
Barker code, 264 

baseband representation, 101, 109, 116, 
136, 162 
FT of, 117 

inner product, 125, 137, 276-278 
of convolution, 126, 137 
of energy-limited signal, 136-139 
characterization, 136, 137 
definition, 136 
inner product, 137 
properties, 138 
recovering from, 137 
sampling of, see complex sampling 
of filter's output, 128, 137 
of integrable signal, 116-129 
characterization, 120, 123 
definition, 116 
FT of, 117 
inner product, 126 
recovering from, 123 
of QAM, 267 

sampling of, see complex sampling 
basis, 29, 144, 144n 
baud period, 680 
in PAM, 177 
in QAM, 268 
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baud rate 

in PAM, 177 
in QAM, 268 
Baudot, J.M.E, 177n 
BER, 637 
Bernoulli, 442 

Bernstein's Inequality, 92-93, 614 
Bessel function, 355, 624 
Bhattacharyya Bound, 373, 419-421 
bi-infinite block-mode 
with PAM 

definition, 229 

operational PSD, 255 

power, 229 
with QAM 

operational PSD, 318 

power, 313 
bi-orthogonal code, 684 
bi-orthogonal keying, 596-599 
BIBO stable, see stable 
Biglieri, Ezio, xxiv 
binary field, 654 

binary hypothesis testing, 360-403 
binary- input/output-symmetric, 676 
binary-to-complex block encoder, 308 
binary-to-reals block encoder, 173, 229 
Binomial Expansion, 592 
bit error, 654 
bit error rate, 637, 674 
bit rate, 680 
block error, 654 
block-encoder 

binary-to-complex 

definition, 308 

rate, 308 
binary-to-reals 

definition, 173 

rate, 173 
block-mode, 172-174, 313 
blocklength, 657 
Boche, Holger, xxiv 
Bochner's Theorem, 526 
Bonferroni Inequalities, 429 
Boole's Inequality, 414n 
Borgmann, Moritz, xxiv 
bounded-input/bounded-output stable, see 

stable 
Boyd, Stephen, xxiv 
Brandle, Marion, xxiv 
Braendle, Samuel, xxiv 
Brickwall function, 75 
FT of, 67, 75-76 



IFT of, 67, 75-76 
Bross, Shraga, xxiv 



C, 1 

Cantor set, 8n 

carrier frequency, 103, 161, 161n 

Cauchy-Riemann equations, 291 

Cauchy-Schwarz Inequality, 18-22 

for d-tuples, 25 

for random variables, 23 

for sequences, 25 
causal filter, 58 
causality, 182 
centered complex Gaussian 

random variable, 500 

random vector, 504 
centered Gaussian 

random variable, 341 

random vector, 454 
centered stochastic process, 203 
central chi-square distribution, 352-356 
Central Limit Theorem, 339 
change of variable 

complex random variable, 291 

complex vector, 296, 305 

real vector, 290 
characteristic function 

of a central \ 2 > 353 

of a complex random variable, 289 

of a complex random vector, 295 

of a pair of real random variables, 289 

of a real Gaussian RV, 351 

of a real Gaussian vector, 475 

of a real random variable, 350 

of a real random vector, 468-469 

of a squared Gaussian, 352 
charge density, 245 
circular symmetry, 494-511 

of a complex Gaussian, 502 

of a complex Gaussian vector, 507-509 

of a complex random vector 
and linear functionals, 503 
and linear transformations, 503 
and properness, 504 
definition, 502 

of a CRV 

and expectation, 495 
and properness, 499 
characterization, 498 
definition, 495 
clock, 613 
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closed subspace, 144n, 152 
code property, 659 
Coding Theory, 653 
coherent decoder, 628 
colored noise, 599-604 
compact support 

function of, 330 
complete (space), 71, 152 
complete orthonormal system, see CONS 
complex conjugate 
of a matrix, 284 
of a scalar, 15 
complex dimensions per second, 266, 271 
complex Gaussian 

random variable, 499-502, 511 
centered, 500 
circularly-symmetric, 502 
definition, 500 
proper, 500 
random vector, 504 

and linear transformations, 505 
centered, 504 
characterization, 505 
circularly-symmetric, 507 
definition, 504 
proper, 505, 507-509 
complex magnitude, see absolute value 
complex modulus, see absolute value 
complex positive semidefinite matrix, 304, 

507 
complex random variable, see CRV 
complex random vector 

characteristic function, 295 
circularly-symmetric, see circular sym- 
metry 
covariance matrix, 293 
definition, 292 
expectation, 293 
finite variance, 293 
proper, 293-295 
transforming, 296, 305 
complex sampling, 122, 162-163 

reconstruction from, 163-166 
complex signal, 3 

complex stochastic process, see CSP 
complex symbols per second, 266 
complex-valued signal, 3 
componentwise antipodal mapping, 656 
composite hypothesis testing, 430n, 614 
composition (of functions), 2 
conditional 

distribution, 363-364, 483 



independence, 379 
probability, 406 
conjugate (of a matrix), 284 
conjugate-symmetric, 65, 108 
conjugate-symmetric matrix, 284 
CONS, 143-159 

characterization, 145 

definition, 144 

for closed subspaces, 155 

for energy-limited signals that are 

bandlimited to W Hz, 148, 149 
for energy-limited signals that vanish 

outside an interval, 147 
Prolate Spheroidal Wave Functions, 
157 
consistency property (of FDDs), 513 
constellation 

M-PSK, 274 
of PAM, 177-181 
definition, 177 
minimum distance, 178 
normalization, 178 
number of points, 178 
second moment, 178 
of QAM, 274 
definition, 274 
minimum distance, 274 
number of points, 274 
second moment, 274 
QPSK, 274 
square 4-QAM, 274 
convergence of random variables 
almost surely, 356 
in distribution, 357 
in mean square, 356 
in probability, 356 
with probability one, 356 
convergence of random vectors 
almost surely, 487 
in distribution, 488 
in mean square, 487 
in probability, 487 
with probability one, 487 
convolution, 53-63, 68, 139 

baseband representation of, 126, 137 
between real and complex signals, 121 
FT of, 77 
limits of, 327 
uniformly continuous, 55 
correlation coefficient, 23 
covariance 

between two CRVs, 288-289 
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between two RVs, 23 
Covariance Inequality, 23 
covariance matrix 

and positive semidefinite matrices, 471 
of a complex random vector, 293 
of a real random vector, 464 
singular, 466-468 
covariance stationary, see WSS 
Craig's formula, 345 
CRV, 283-306 

argument, 290 
characteristic function, 289 
circularly-symmetric, see circular sym- 
metry 
covariance, 288-289 
definition, 285 
density, 286 
distribution, 285 
expectation, 283, 286 
magnitude, 290 
proper, 287-288 
transforming, 289, 291 
variance, 283, 287 
cryptography, 496 
CSP 

centered, 297 
continuous time 
measurable, 315n 
operational PSD, 315 
definition, 297 
discrete-time, 297-306 

autocovariance function, 300 
covariance stationary, see WSS 
proper, 298 
PSD, 300 

second-order stationary, see WSS 
spectral distribution function, 303 
stationary, see stationary 
strict-sense stationary, see station- 
ary 
strongly stationary, see stationary 
weakly stationary, see WSS 
wide-sense stationary, see WSS 
WSS, see WSS 
finite variance, 297 
cumulative distribution function, 343 
cyclostationary, 245n 

D 

de Caen's Inequality, 429 
decision rule, see guessing rule 
decoding rule, see guessing rule 



degree-n trigonometric polynomial, 686 
degrees of freedom 

of a central \ distribution, 353 

of a noncentral \ distribution, 354 

of a signal, 98 
delay 

in PAM, 181 

introduced by channel, 613 
Dembo, Amir, xxiv 
dense subset of Ci , 330 
detection in white Gaussian noise, 562-612 

M-PSK, 588-590 

antipodal signaling, 586-587 

bi-orthogonal keying, 596-599 

binary signaling, 586-588 

in passband, 584-585 

optimal decision rule, 572-576 

orthogonal keying, 590-593 

probability of error, 576-577 

signals of infinite bandwidth, 604-605 

simplex, 593-596 

sufficient statistics, 567-572 
differentiable complex function, 290 
digital implementation, 182 
dimension, 30, 657 
Dirac's Delta, 3 

discrete-time single-block model, 642 
distance spectrum, see weight enumerator 
domain, 2 

Dominated Convergence Theorem, 702 
dual code, 683 
duality, 151 
Durisi, Giuseppe, xxiv 
dynamic range, 582 

E 

eigenvalue, 459 
eigenvector, 459 
encoder property, 659 
energy 

in baseband and passband, 126, 138 

in PAM, 220-223 

in QAM, 307-310 

of a complex signal, 16 

of a real signal, 14 
energy per bit 

in PAM, 222 

in QAM, 310 
energy per complex symbol 

in PAM, 337 

in QAM, 310 
energy per symbol 
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in PAM, 222 

in QAM, 310 
energy-limited passband signal, see pass- 
band signal 
energy-limited signal, 16 

that is bandlimited, 47, 79-87 
bandwidth of, 81 
continuity of, 84 
definition, 80 
of zero energy, 80 
through a stable filter, 

through an ideal unit-gain LPF, 85 
entire function, 60, 93-96 

of exponential type, 96 
Ephraim, Yariv, xxiv 
equal law 

complex random variables, 285 

complex random vectors, 292, 295 

random variables, 208 

random vectors, 208 
equalization, 649 
equivalence class, 49-50, 70 
equivalence relation, 48 
essential supremum, 50n 
estimation 

and conditional expectation, 486 

jointly Gaussian vectors, 486 
Estimation Theory, 486 
Euler's Identity, 121 
event, 3, 201 

excess bandwidth, 193, 196, 271, 680 
exclusive-or, 448, 565n, 654 
expectation 

of a complex random vector, 293 

of a CRV, 286 

of a random matrix, 464 

of a random vector, 463 
expected energy, 221 
experiment outcome, 3, 201 
exponential distribution, 353 



F 2 , 654 

Factorization Theorem, 433-435 

FDD, 204, 512-515 

consistency property, 513 

of a continuous-time Gaussian SP, 515 

symmetry property, 513 
Fejer's kernel, 687 
field, 654 
filter, 58-61 



baseband representation of output, 
128, 137 

causal, 58 

front-end, see front-end filter 

stable, 58 

whitening, see whitening filter 
finite-dimensional distribution, see FDD 
finite- variance 

complex random vector, 293 

complex stochastic process, 297 

continuous-time real SP, 512 

random vector, 464 
Fisher, R. A., 451 
Forney, David Jr., xxiv 
Fourier Series, 147-148, 686-696 

CONS, 691 

pointwise reconstruction, 695 

reconstruction in Li , 688 
Fourier Series Coefficient, 148, 686, 693 
Fourier Transform, 64-100 

boundedness, 73 

conjugate-symmetric, 65, 101, 108-109 

continuity, 73 

definition 

for elements of L 2 , 71 
for signals in Ci , 64 

of sinc(-), 70, 76 

of a product, 90 

of baseband representation, 117 

of convolution, 77 

of real signals, 65 

of symmetric signals, 65 

of the Brickwall function, 67 

preserves inner products, 65, 67-69 

properties, 67 

reconstructing from, 74 

reconstructing using IFT, 74, 75 
frequency response, 77 

of ideal unit-gain BPF, 79 

of ideal unit-gain LPF, 78 

with respect to a band, 129 
front-end filter, 582-584 
FT, see Fourier Transform 
Fubini's Theorem, 10, 11, 69 
function, 14 

all-zero, 3 

domain, 2, 14 

energy-limited, 15, 16 

image, 2 

injective, 172 

integrable, 5 

Lebesgue measurable, 4 
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notation, 2 
one-to-one, 172 
onto, 2 
range, 2, 14 
surjective, 2 

G 

Gallager, Robert, xxiv 
Galois Field, 654 
Gamma function, 353 
Gaussian 

complex random vector, see complex 

Gaussian 
continuous-time SP, 515-516, 518-520, 
524, 537-552, 554-558 
definition, 515 
FDDs, 515 
filtering, 546, 552 
linear functionals, 537-545 
PSD, 524 
stationary, 518 
white, xix, 554-558 
CRV, see complex Gaussian 
random variable, 341 

and affine transformations, 341 
characteristic function, 351 
convergence, 356-357 
density, 342 
MGF, 349 

standard, see standard Gaussian 
random vector, 455 

a canonical representation, 478-481 
and affine transformations, 473 
and pairwise independence, 477 
centered, 454 

characteristic function, 475 
convergence, 487-488 
density, 481-482 
linear functionals of, 482-483 
moments, 486 

standard, see standard Gaussian 
generalized Rayleigh distribution, 354 
generalized Rice distribution, 355 
generator matrix, 659 
GF(2) 

addition, 654 
multiplication, 654 
Gram-Schmidt procedure, 44-48 
guessing rule 

definition, 361, 405 

MAP, see MAP 

maximum a posteriori, see MAP 



maximum likelihood, see ML 

ML, see ML 

optimal, 362, 405 

probability of error, 362, 405 

randomized, 368-370, 408 

with random parameter, 396-398 

H 

Hosli, Daniel, xxiv 

Hadamard code, 684 

half-normal, 35 In 

Hamming and Euclidean distance, 657 

Hamming code, 683 

Hamming distance, 656 

Hamming weight, 656 

hard decisions, 681 

Hellinger distance, 403 

Herglotz's Theorem, 217 

Hermite functions, 99 

Hermitian conjugate, 284 

Hermitian matrix, 284 

Hilbert Transform, 139 

Hilbert Transform kernel, 140 

Ho, Minnie, xxiv 

holomorphic function, see analytic function 

hypothesis testing 

M-ary, see multi-hypothesis testing 
binary, see binary hypothesis testing 



ideal unit-gain bandpass filter, 61, 79, 103 

frequency response, 61, 79 

impulse response, 61 

is not causal, 61 

is unstable, 61 
ideal unit-gain lowpass filter, 60 

cutoff frequency, 60 

frequency response, 60, 78 

impulse response, 60 

is not causal, 60 

is unstable, 60 
IID random bits, 229 
image 

of a linear transformation, 655 

of a mapping, 2 
impulse response, 58 
in-phase component, 121, 122, 137 

of energy-limited signal, 137 

of integrable signal, 122 
independent random variables, 378, 476 
independent stochastic processes, 515 
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indistinguishable, 48 
inflnite divisibility, 359 
injective, 172 
inner product, 14-25 

and baseband representation, 126 

and QAM, 275-280 

and the analytic representation, 114 

and the baseband representation, 125, 
137, 276-278 

between complex signals, 15 

between real signals, 14 

between tuples, 392n 

properties, 16 
integrable 

complex functions, 17 

complex signal, 5 

passband signal, see passband signal 
integrable signal 

definition, 5 

that is bandlimited, 87-89 
bandwidth of, 89 
through a stable filter, 90 
integral 

of a complex signal, 5-6 
definition, 5 
properties, 6 

of a real signal, 4 
inter-symbol interference, xix, 649 
Inverse Fourier Transform, 65 

definition, 66 

of symmetric signals, 65 

of the Brickwall function, 67 

properties, 66 
irrelevant data, 447-449 

and random parameters, 450 
isomorphism, 156, 166 
Ito Calculus, 605 



joint distribution function, 513n 
jointly Gaussian random vectors, 483-486 
and estimation, 486 

K 

kernel, 655 

Kim, Young-Han, xxiv 

Koch, Tobias, xxiv 

Koksal, Emre, xxiv 

Kolmogorov's Existence Theorem, 513 

Kolmogorov, A. N., 363, 451 

Kontoyiannis, Ioannis, xxiv 



Ci,5 

Ci -Fourier Transform, 64 

£ s , 15, 26-51, 70 

L 2 , 43, 48-50, 70 

L e -Fourier Transform, 70-73 

definition, 71 

properties, 71 
£ 2 -Sampling Theorem, 151, 162, 164 

for passband signals, 165 
Laneman, Nicholas, xxiv 
Lapidoth, Danielle, xxv 
Laplace Transform, 349 
Lebesgue integral, 4 
Lebesgue measurable 

complex signal, 5 

real signal, 4 
Lebesgue null set, see set of Lebesgue mea- 
sure zero 
length of a vector, 30 
likelihood-ratio function, 371 
linear (K,N) F 2 code, 657 
linear (K, N) F 2 encoder, 657 
linear binary code with antipodal signaling, 
653-682 

definition, 660 

minimizing block errors, 666-671 
max-correlation decision rule, 667 
optimal decoding, 666 
probability of a block error, 668-671 

power, 661, 664 

PSD, 664 
linear binary encoder with antipodal signal- 
ing 

definition, 659 

minimizing bit errors, 671-675 
optimal decoding, 671 
probability of a bit error, 672-675 
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linear functionals 

of a Gaussian SP, 537-545 

of a SP, 530-545 

onFJ, 661 
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definition, 482 
of Gaussian vectors, 483 
linear mapping, 655 
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log likelihood-ratio function, 372 
look-up table, 182 
low-density parity-check codes, 682 
lowpass filter, 60, see also ideal unit-gain 

lowpass filter 
LR(-), 371 

M 
magnitude, see absolute value 
MAP, 370-372, 408-409 
mapping, see function 
Markov chain, 379, 439-440 
mass density, 246 
mass line density, 247 
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matched filter, 58-60, 175, 176 

and inner products, 59-60 

definition, 59 
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conjugation, 284 

Hermitian, 284 

Hermitian conjugate, 284 

orthogonal, 458 

positive definite, 461 

positive semidefinite 
complex, 304, 507 
real, 461 

self-adjoint, 284 

symmetric, 284 

Toeplitz, 304 

transpose, 284 
matrix representation of an encoder, 658 
maximum a posteriori, see MAP 
maximum distance separable (MDS), 685 
maximum likelihood, see ML 
maximum-correlation rule, 424, 574, 575 
measurable 

complex signal, 5 

complex stochastic process, 315n 

real signal, 4 

stochastic process, 238, 529 
memoryless 

binary- input/output-symmetric, 676 

property of the exponential, 205 
message error, 637, 654 
MGF, 349 

definition, 349 

of a central chi-square, 353 

of a Gaussian, 349 

of a noncentral chi-square, 354 

of a squared Gaussian, 352 



of the sum of independent RVs, 354 
Miliou, Natalia, xxiv 
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Minimum Shift Keying, 608 
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M-PSK, 274, 410-414, 418-419, 588-590 
multi-dimensional hypothesis testing 

M-ary, 421-427 

binary, 390-396 
multi-hypothesis testing, 404-429 
multiplication by a carrier 

doubles the bandwidth, 105 

FT of the result of, 105 
multivariate Gaussian, 454-493, see also 
Gaussian 
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nuisance parameter, see random parameter 
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Nyquist pulse, 189 
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observation, 361, 

one-to-one, 172 

open subset, 289, 289n 

operational PSD, 245-264 
and the PSD, 552 
definition, 250-252 
of a CSP, 315 
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of PAM, 252-264 

of QAM, 315-320 

uniqueness, 252 
optimal guessing rule, 362, 405 
orthogonal 

binary tuples, 683 

real passband signals, 126 

signals, 32 
orthogonal keying, 590-593 

noncoherent detection, 613-631 
orthogonal matrix, 458 
orthonormal 

basis, 36-48 

construction, 45 
definition, 37 
existence, 43 

tuple, 36 



packet, 639 

pairwise independence, 476 
pairwise sufficiency, 435-439 
Paley- Wiener, 95-96 
PAM, 176-184, 220-244, 634-642 
baud period, 177 
baud rate, 177 
constellation, 177-181 
definition, 177 
minimum distance, 178 
normalization, 178 
number of points, 178 
second moment, 178 
detection in white noise, 634-642 
digital implementation, 182 
energy, 220-223 
energy per bit, 222 
energy per symbol, 222 
operational PSD, 252-264 
power, 223-244 
pulse shape, 177 
spectral efficiency, 266 
parity-check matrix, 659 
Parseval's Theorem, 72, 115 
Parseval-like theorems, 67-69 
passband signal, 101-141 

analytic representation of, 109, 135 
definition, 103 
energy-limited, 101, 130-139 

bandwidth around a carrier, 104 
baseband representation of, 136 
characterization, 131, 133 
definition, 103 



is bandlimited, 133 
sampling, 161-168 
through BPF, 134 
integrable, 101 

analytic representation of, 110 
bandwidth around a carrier, 104 
baseband representation of, 116 
characterization, 103 
definition, 103 
inner product, 114 
is bandlimited, 104 
is finite-energy, 104 
through stable filter, 104 
sampling, 161-168 
periodic extension, 686 
periodically continuous, 686 
phase shift keying, see M-PSK 
picket fences, 96-98 
picket-fence miracle, 96 
tt/4-QPSK, 337 
Plackett's Identities, 492 
Plancherel's Theorem, 72 
Pointwise Sampling Theorem, 151, 163 

for passband signals, 165 
Poisson distribution, 355 
Poisson summation, 96-98 
positive definite function 
from E to C, 199 
from R to R, 521 
from Z to C, 300 
from Z to E, 212 
positive definite matrix, 461 
positive semidefinite matrix 
complex, 304, 305, 507 
real, 461 
power 

in baseband and passband, 311, 320- 

327 
in PAM, 223-244 
in QAM, 310-314 
of a SP, 238 
power spectral density, see PSD 
Price's Theorem, 492 
prior 

definition, 361, 404 
nondegenerate, 361, 404 
uniform, 361, 404 
probability density function, 247 
probability of error 

binary hypothesis testing 
Bhattacharyya Bound, 373 
general decision rule, 366 
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in IID Gaussian noise, 393-395 

in white noise, 586, 587 

no observables, 362 

noncoherent, 626-627 

optimal, 367, 407 
multi-hypothesis testing 

M-PSK, 412-414 

8-PSK, 590 

bi-orthogonal keying, 599 

in IID Gaussian noise, 425-427 

no observables, 406 

noncoherent, 630-631 

orthogonal keying, 592 

simplex, 596 

Union Bound, 416-419 

Union-Bhattacharyya Bound, 419- 
421 
probability space, 3, 201 
processing, 376-381, 409 
projection 

as best approximation, 145 

onto a finite-dimensional subspace, 40 

onto a vector in C2, 34 



onto a vector in 



34 



onto an infinite-dimensional subspace, 
159 
Prolate Spheroidal Wave Functions, 157 
proper 

complex Gaussian RV, 500 

complex Gaussian vector, 505, 507- 
509 

complex random vector, 293-295 

CRV, 287-288 

discrete-time CSP, 298 
PSD 

of a continuous-time SP, 523, 552-554 

of a discrete-time CSP, 300 

of a discrete-time SP, 213-218 
pulse amplitude modulation, see PAM 
pulse shape 

in PAM, 177 

in QAM, 268 
Pythagoras's Theorem, 32 
Pythagorean Theorem, 33 

Q 

QAM, 265-282, 307-338, 642-649 
bandwidth, 270 

baseband representation of, 267 
baud period, 268 
baud rate, 268 
constellation, 274 



definition, 274 
minimum distance, 274 
M-PSK, 274 
number of points, 274 
second moment, 274 
square 4-QAM, 274 
detection in white noise, 642-649 
energy, 307-310 
energy per bit, 310 
energy per symbol, 310 
inner products, 275-280 
operational PSD, 315-320 
power, 310-314 
pulse shape, 268 
spectral efficiency, 273-274 
symbol recovery, 275-280 
Q-function, 344-348 
QPSK, 274 
quadrature amplitude modulation (QAM), 

see QAM 
quadrature component, 121, 122, 137 
of energy-limited signal, 137 
of integrable signal, 122 
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radially-symmetric function, 495 
Radon-Nikodym Theorem, 405n 
raised-cosine, 196 

random function, see stochastic process 
random parameter, 396-398, 449-451, 617 

and white noise, 613-631 
random process, see stochastic process 
random variable, 3, 201 
random vector 

characteristic function, 468-469 

covariance matrix, 464 

finite variance, 464 
randomized decision rule, 368-370, 408 
randomized guessing rule, see randomized 

decision rule 
rate, 173 

in bits per complex symbol, 172, 268 

in bits per real symbol, 172 
Rayleigh distribution, 354 
real dimensions per second, 177 
real passband signals 

analytic representation, see analytic 
representation 

baseband representation, see baseband 
representation 

condition for orthogonality, 126 
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Sampling Theorem, see Sampling The- 
orem 
real positive semidefinite matrix, 461 
real signal, 2 

real symbols per second, 177 
real-valued signal, 2 
reflection, 3, 53, see mirror image 
repetition code, 683 
representation of an encoder by a matrix, 
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Rice distribution, 355 
Riemann 

integrable, 4 

integral, 4, 6 
Riemann-Lebesgue Lemma, 690 
Riesz-Fischer Theorem, 153 
Rimoldi, Bixio, xxiv 
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sample function, see sample-path 
sample of a stochastic process, 202 
sample-path, 201 

sample-path realization, see sample-path 
sampling as an isomorphism, 156 
Sampling Theorem, 75, 143, 148-157 

for passband signals, 161-168 

isomorphism, 156 

J0.2, 151 

pointwise, 151 
Sandwich Theorem, 154, 154n 
Sanjoy, Mitter, xxiv 
Sason, Igal, xxiv 

second-order stationary, see WSS 
self-adjoint matrix, 284 
self-similarity function 

of energy-limited signal, 186-188 
definition, 186 
properties, 186 

of integrable signal 
definition, 198 
FT of, 198 
set of Lebesgue measure zero, 7, 9 
Shannon, Claude E., xvii, 171 
Shrader, Brook, xxiv 
(j-algebra 

generated by a SP, 514 

generated by RVs, 364n 

generated by the cylindrical sets, 514 

product, 238, 238n 
signal 



complex, 14 
real, 14 
signature, 243 
simplex, 593-596 
simulating observables, 441-443 
sinc(-), 75 

definition, 60 
FT of, 70, 76 
single parity check code, 657 
Singleton Bound, 681, 685 
singular covariance matrix, 466-468 
Slepian's Inequality, 492 
soft decisions, 681 
SP, see stochastic process 
span, 29 

spectral efficiency, 266, 273-274 
Spectral Theorem, 460 
stable filter, 58 
standard complex Gaussian 
random variable, 494-495 
and properness, 495 
definition, 494 
density, 494 
mean, 495 
variance, 495 
random vector, 502 

covariance matrix, 502 
definition, 502 
density, 502 
mean, 502 
proper, 502 
standard deviation, 342 
standard Gaussian 

complex vector, see standard complex 

Gaussian 
CRV, see standard complex Gaussian 
random variable 
CDF, 343 
definition, 339 
density, 339 
moments, 351 
random vector 

covariance matrix, 470 
definition, 454 
density, 469 
mean, 470 
standard inner product, 392n 
stationarization argument, 257 
stationary 

continuous-time SP, 516 
discrete-time CSP, 297 
discrete-time SP, 208, 209 
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stochastic process, 171, 201-207 
centered, 203 
complex, see CSP 
continuous-time, 512-561 

autocovariance function, 517, 520- 
522 

average power, 528-530 

bandlimited, 252 

bandwidth, 252 

centered, 512 

covariance stationary, see WSS 

definition, 204 

FDD, 512-515 

filtering, 546-552 

finite variance, 512 

finite-dimensional distribution, see 
FDD 

Gaussian, see Gaussian 

independence, 515 

linear functionals, 530-545 

measurable, 529 

path, 512 

PSD, 523, 552-554 

realization, 512 

sample-function, 512 

sample-path, 512 

second-order stationary, see WSS 

spectral distribution function, 525- 
528 

state at time-i, 512 

stationary, see stationary 

strict-sense stationary, see station- 
ary 

strongly stationary, see stationary 

time-i sample, 512 

trajectory, 512 

weakly stationary, see WSS 

wide-sense stationary, see WSS 

WSS, see WSS 
definition, 203 
discrete-time, 208-219 

autocorrelation function, 211 

autocovariance function, 211-218 

covariance stationary, see WSS 

definition, 203 

one-sided, 204 

power spectral density, 213-218 

second-order stationary, see WSS 

spectral distribution function, 217- 
218 

stationary, see stationary 



strict-sense stationary, see station- 
ary 
strongly stationary, see stationary 
weakly stationary, see WSS 
wide-sense stationary, see WSS 
WSS, see WSS 
finite variance, 203 
measurable, 238 
power of, 238 
zero mean, 203 
strict-sense stationary, see stationary 
strictly stationary, see stationary 
strictly systematic encoder, 659 
strongly stationary, see stationary 
subspace, 28, 143 

closed, see closed subspace 
finite-dimensional, 29, 143 
basis for, 29 
dimension of, 30 
having an orthonormal basis, 40 
projection onto, 40 
infinite-dimensional, 29, 143 
sufficient statistics, 381-389, 430-453 

and computability of the a posteriori 

law, 386, 431 
and noncoherent detection, 616-621 
and pairwise sufficiency, 435-439 
and random parameters, 449 
and simulating observables, 441-443 
and the likelihood-ratio function, 383 
factorization criterion, 433-435 
for detection in additive white noise, 

567-572 
in binary hypothesis testing, 381-389 
Markov condition, 439-440 
observation SP, 563-567 
PAM in white noise, 635-642 
QAM in white noise, 642-649 
random parameters and white noise, 
617 
superposition, 3, 20, 26 
support 

compact, 330 
of a PSD, 329 
symmetric matrix, 284 
symmetric random variable, 217 
symmetric set difference, 565 
symmetry property (of FDDs), 513 
systematic encoder, 659 
systematic single parity check encoder, 657 
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Toeplitz matrix, 304 

total positivity of order 2, 355, 491 

trajectory, see sample-path 

transforming 

complex random variables, 289, 291 

complex random vectors, 296, 305 

real random vectors, 290 
transpose (of a matrix), 284 
Triangle Inequality 

for complex numbers, 6 

for signals, 30 

for stochastic processes, 320 
trigonometric polynomial, 686 
tuple 

of bits, 173 

of signals, 28 
turbo-codes, 682 

U 

uniformly continuous, 55, 55n 
Union Bound, 414-421 
Union-Bhattacharyya Bound, 419-421 
Union-of-Events Bound, see Union Bound 
univariate Gaussian, 339-359, see also 
Gaussian 



in passband, 558 

properties, 555 
white noise, see white Gaussian noise 
white noise paradigm, 605 
whitening filter 

definition, 600 

existence, 604 
Wick's Formula, 486 
wide-sense stationary, see WSS 
Wiener-Khinchin Theorem, 257, 552 
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worst-case performance, 628 
WSS 

continuous-time SP, 517 

discrete-time CSP, 297, 298 

discrete-time SP, 209-218 
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624 
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Wagner's rule, 667 
Wang, Ligong, xxiv 
weak convergence 

of random variables, 357 

of random vectors, 488 
weakly stationary, see WSS 
Weierstrass's Approximation Theorem, 696 
weight enumerator, 670 
wheel-of- fortune, 496 
white Gaussian noise, xix, 554-558 

definition, 555 

detection in, 562-612 



